Simplest way of stripping leading/trailing whitespace from file or program output

−1

What is the simplest shell idiom for stripping leading and trailing whitespace from a file or program output? Ideally I am looking for the equivalent of trim or strip methods in some languages.

The ideal solution should

skip empty lines at the beginning and end of the file/stream
provide an option to also strip leading and trailing whitespace from all non-empty lines

text-processing

posted almost 2 years ago

CC BY-SA 4.0

2y ago by AdminBee‭

matthewsnyder‭

1881 reputation 125 77 298 110

Raw

Markdown

History

is a duplicate

This question has been asked before and has already been answered. It should be marked as a duplicate.

Please enter the URL of the proposed duplicate in the details field below.

not constructive

This question cannot be answered in a way that is helpful to anyone. It's not possible to learn something from possible answers, except for the solution for the specific problem of the asker.

1 comment thread

Do tab spaces count as leading or trailing white space? (2 comments)

3 answers

Score Active Age

You are accessing this answer with a direct link, so it's being shown above all other answers regardless of its score. You can return to the normal view.

−0

If I understand you correctly, you want to

skip empty lines at the beginning of a file/stream
strip leading and trailing whitespace of non-empty lines
skip empty lines at the end of a file/stream.

This can be achieved using the following awk program:

awk 'NF==0{if (m) {buf=buf RS}; next}
          {if (!m) {m=1} else {printf "%s",buf;buf=""};sub(/^[[:blank:]]*/,"");sub(/[[:blank:]]*$/,"")}1'

This will do the following

If the line is (visually) empty, i.e. contains only space or nothing at all, it checks if a position flag m (for "main") is set. If set, a "newline" is added to a buffer variable buf. If not, the line is simply ignored and processing skipped to the next line.
If the line was not skipped, it is non-empty.
- In that case, if the flag m was not yet set the program has now reached the "main" part, so m is set to 1. Otherwise, it was already in the main part, and any buffered empty lines need to be printed now (but the buffer cleared). If there were no buffered empty lines, buf will be empty and nothing printed, hence no harm done.
- Then use sub() to strip the leading and trailing whitespace from the line.
- The seemingly stray 1 outside of the action blocks instructs awk to print the current line including all modifications made.

This ensures that leading and trailing empty lines are omitted while keeping (but if necessary sanitizing) internal empty lines.

If you want to omit trimming of the non-empty lines, you can simply skip the two calls to sub().

This approach can be made more comfortable by placing the program in an awk script, say strip_wsp.awk:

#!/usr/bin/awk -f

NF==0{
    if (m) {buf=buf RS}
    next
}

{
    if (!m) {m=1} else {printf "%s",buf;buf=""}

    if (trimlines) {
	sub(/^[[:blank:]]*/,"")
	sub(/[[:blank:]]*$/,"")
    }
}

1

This version allows you to define an awk variable trimlines which, if set to non-zero, trims non-empty lines, but by default would leave them untouched:

Applied to a somewhat expanded version of your example string, the result is like this:

$ printf -- "\n \n hello\nmellow \n \n\nworld\n\n\n" | awk -v trimlines=1 -f strip_wsp.awk
hello
mellow


world

posted almost 2 years ago

CC BY-SA 4.0

2y ago

AdminBee‭

121 reputation 1 6 25 30

Copy Link

Raw

Markdown

History

0 comment threads

−0

The simple and obvious solution:

sed 's/^ *//;s/ *$//'

Many recipes you find online will erroneously add a g flag, but these regular expressions can only match once per line anyway.

(In some more detail, s/from/to/g says to replace all occurrences of from on the current input line; but of course, if you know from can only match once, you don't want or need that. It is harmless as such, of course, but betrays a cargo cultish lack of understanding of the construct.)

Your requirement to treat the first and last lines differently seems odd to me, but sed easily allows you to do that too.

sed '1s/^ *//;$s/ *$//'

This adds the address expression 1 to the first command (which matches on line number 1) and the address $ to the last (which matches the final input line).

If your sed implementation doesn't support stringing multiple commands together with ; as shown above, you can pass in the script piecemeal with multiple -e options.

sed -e '1s/^  *//' -e '$s/ *$//'

The regular expressions above specifically target literal spaces. If you want to target any whitespace, replace each with [[:space:]], which is a POSIX character class which matches one whitespace character of any kind (space, tab, etc).

Something similar could be achieved with Awk with a clever RS (record separator) but I'd consider that more obscure, as well as probably slower.

posted almost 2 years ago

CC BY-SA 4.0

2y ago

tripleee‭

121 reputation 0 4 12 7

Copy Link

Raw

Markdown

History

1 comment thread

The OP mentioned whitespace, but this will only remove spaces. Also, you have the portability backwar... (3 comments)

−0

I'll post this as an example of what I'm looking to do.

The following script:

import sys

a = sys.stdin.read()
b = a.strip()
c = map(lambda s: s.strip(), b.splitlines())

for s in c:
    print(s)

Will remove:

Whitespace at the beginning and end of the file or stream
Whitespace at the beginning and end of each line (except for the legitimate line ending, of course)

$ echo -e " hello\nmellow \nworld\n\n\n" | python trim.py | bat -A
───────┬─────────────
       │ STDIN
       │ Size: -
───────┼─────────────
   1   │ hello␊
   2   │ mellow␊
   3   │ world␊
───────┴─────────────

Caveats:

In practice, you would probably want to add this script to your shell's PATH to use it
It might be worth adding some CLI flags to control which whitespace exactly is removed
The performance (especially memory usage) of this is probably bad, it does not efficiently handle one line at a time (and maintain a "consecutive blanks" buffer for removing trailing whitespace)

This seems like such an obvious task that there must surely be a Unix program for it already. However I could not find anything better than a Python script or sed with a somewhat-complex regex.

posted almost 2 years ago

CC BY-SA 4.0

matthewsnyder‭

1881 reputation 125 77 298 110

Copy Link

Raw

Markdown

History

1 comment thread

Somewhat complex and inefficient (3 comments)

Communities

Simplest way of stripping leading/trailing whitespace from file or program output

1 comment thread

3 answers

0 comment threads

1 comment thread

1 comment thread