Simplest way of stripping leading/trailing whitespace from file or program output
What is the simplest shell idiom for stripping leading and trailing whitespace from a file or program output? Ideally I am looking for the equivalent of trim
or strip
methods in some languages.
The ideal solution should
- skip empty lines at the beginning and end of the file/stream
- provide an option to also strip leading and trailing whitespace from all non-empty lines
3 answers
I'll post this as an example of what I'm looking to do.
The following script:
import sys
a = sys.stdin.read()
b = a.strip()
c = map(lambda s: s.strip(), b.splitlines())
for s in c:
print(s)
Will remove:
- Whitespace at the beginning and end of the file or stream
- Whitespace at the beginning and end of each line (except for the legitimate line ending, of course)
$ echo -e " hello\nmellow \nworld\n\n\n" | python trim.py | bat -A
───────┬─────────────
│ STDIN
│ Size: -
───────┼─────────────
1 │ hello␊
2 │ mellow␊
3 │ world␊
───────┴─────────────
Caveats:
- In practice, you would probably want to add this script to your shell's
PATH
to use it - It might be worth adding some CLI flags to control which whitespace exactly is removed
- The performance (especially memory usage) of this is probably bad, it does not efficiently handle one line at a time (and maintain a "consecutive blanks" buffer for removing trailing whitespace)
This seems like such an obvious task that there must surely be a Unix program for it already. However I could not find anything better than a Python script or sed
with a somewhat-complex regex.
The simple and obvious solution:
sed 's/^ *//;s/ *$//'
Many recipes you find online will erroneously add a g
flag, but these regular expressions can only match once per line anyway.
(In some more detail, s/from/to/g
says to replace all occurrences of from
on the current input line; but of course, if you know from
can only match once, you don't want or need that. It is harmless as such, of course, but betrays a cargo cultish lack of understanding of the construct.)
Your requirement to treat the first and last lines differently seems odd to me, but sed
easily allows you to do that too.
sed '1s/^ *//;$s/ *$//'
This adds the address expression 1
to the first command (which matches on line number 1) and the address $
to the last (which matches the final input line).
If your sed
implementation doesn't support stringing multiple commands together with ;
as shown above, you can pass in the script piecemeal with multiple -e
options.
sed -e '1s/^ *//' -e '$s/ *$//'
The regular expressions above specifically target literal spaces. If you want to target any whitespace, replace each
with [[:space:]]
, which is a POSIX character class which matches one whitespace character of any kind (space, tab, etc).
Something similar could be achieved with Awk with a clever RS
(record separator) but I'd consider that more obscure, as well as probably slower.
1 comment thread
If I understand you correctly, you want to
- skip empty lines at the beginning of a file/stream
- strip leading and trailing whitespace of non-empty lines
- skip empty lines at the end of a file/stream.
This can be achieved using the following awk
program:
awk 'NF==0{if (m) {buf=buf RS}; next}
{if (!m) {m=1} else {printf "%s",buf;buf=""};sub(/^[[:blank:]]*/,"");sub(/[[:blank:]]*$/,"")}1'
This will do the following
- If the line is (visually) empty, i.e. contains only space or nothing at all, it checks if a position flag
m
(for "main") is set. If set, a "newline" is added to a buffer variablebuf
. If not, the line is simply ignored and processing skipped to thenext
line. - If the line was not skipped, it is non-empty.
- In that case, if the flag
m
was not yet set the program has now reached the "main" part, som
is set to 1. Otherwise, it was already in the main part, and any buffered empty lines need to be printed now (but the buffer cleared). If there were no buffered empty lines,buf
will be empty and nothing printed, hence no harm done. - Then use
sub()
to strip the leading and trailing whitespace from the line. - The seemingly stray
1
outside of the action blocks instructsawk
to print the current line including all modifications made.
- In that case, if the flag
This ensures that leading and trailing empty lines are omitted while keeping (but if necessary sanitizing) internal empty lines.
If you want to omit trimming of the non-empty lines, you can simply skip the two calls to sub()
.
This approach can be made more comfortable by placing the program in an awk
script, say strip_wsp.awk
:
#!/usr/bin/awk -f
NF==0{
if (m) {buf=buf RS}
next
}
{
if (!m) {m=1} else {printf "%s",buf;buf=""}
if (trimlines) {
sub(/^[[:blank:]]*/,"")
sub(/[[:blank:]]*$/,"")
}
}
1
This version allows you to define an awk
variable trimlines
which, if set to non-zero, trims non-empty lines, but by default would leave them untouched:
Applied to a somewhat expanded version of your example string, the result is like this:
$ printf -- "\n \n hello\nmellow \n \n\nworld\n\n\n" | awk -v trimlines=1 -f strip_wsp.awk
hello
mellow
world
1 comment thread