Communities

Writing
Writing
Codidact Meta
Codidact Meta
The Great Outdoors
The Great Outdoors
Photography & Video
Photography & Video
Scientific Speculation
Scientific Speculation
Cooking
Cooking
Electrical Engineering
Electrical Engineering
Judaism
Judaism
Languages & Linguistics
Languages & Linguistics
Software Development
Software Development
Mathematics
Mathematics
Christianity
Christianity
Code Golf
Code Golf
Music
Music
Physics
Physics
Linux Systems
Linux Systems
Power Users
Power Users
Tabletop RPGs
Tabletop RPGs
Community Proposals
Community Proposals
tag:snake search within a tag
answers:0 unanswered questions
user:xxxx search by author id
score:0.5 posts with 0.5+ score
"snake oil" exact phrase
votes:4 posts with 4+ votes
created:<1w created < 1 week ago
post_type:xxxx type of post
Search help
Notifications
Mark all as read See all your notifications »
Q&A

Simplest way of stripping leading/trailing whitespace from file or program output

+4
−1

What is the simplest shell idiom for stripping leading and trailing whitespace from a file or program output? Ideally I am looking for the equivalent of trim or strip methods in some languages.

The ideal solution should

  • skip empty lines at the beginning and end of the file/stream
  • provide an option to also strip leading and trailing whitespace from all non-empty lines
History
Why does this post require moderator attention?
You might want to add some details to your flag.
Why should this post be closed?

1 comment thread

Do tab spaces count as leading or trailing white space? (2 comments)

3 answers

+4
−0

The simple and obvious solution:

sed 's/^ *//;s/ *$//'

Many recipes you find online will erroneously add a g flag, but these regular expressions can only match once per line anyway.

(In some more detail, s/from/to/g says to replace all occurrences of from on the current input line; but of course, if you know from can only match once, you don't want or need that. It is harmless as such, of course, but betrays a cargo cultish lack of understanding of the construct.)

Your requirement to treat the first and last lines differently seems odd to me, but sed easily allows you to do that too.

sed '1s/^ *//;$s/ *$//'

This adds the address expression 1 to the first command (which matches on line number 1) and the address $ to the last (which matches the final input line).

If your sed implementation doesn't support stringing multiple commands together with ; as shown above, you can pass in the script piecemeal with multiple -e options.

sed -e '1s/^  *//' -e '$s/ *$//'

The regular expressions above specifically target literal spaces. If you want to target any whitespace, replace each with [[:space:]], which is a POSIX character class which matches one whitespace character of any kind (space, tab, etc).

Something similar could be achieved with Awk with a clever RS (record separator) but I'd consider that more obscure, as well as probably slower.

History
Why does this post require moderator attention?
You might want to add some details to your flag.

1 comment thread

The OP mentioned whitespace, but this will only remove spaces. Also, you have the portability backwar... (3 comments)
+1
−0

If I understand you correctly, you want to

  • skip empty lines at the beginning of a file/stream
  • strip leading and trailing whitespace of non-empty lines
  • skip empty lines at the end of a file/stream.

This can be achieved using the following awk program:

awk 'NF==0{if (m) {buf=buf RS}; next}
          {if (!m) {m=1} else {printf "%s",buf;buf=""};sub(/^[[:blank:]]*/,"");sub(/[[:blank:]]*$/,"")}1'

This will do the following

  1. If the line is (visually) empty, i.e. contains only space or nothing at all, it checks if a position flag m (for "main") is set. If set, a "newline" is added to a buffer variable buf. If not, the line is simply ignored and processing skipped to the next line.
  2. If the line was not skipped, it is non-empty.
    • In that case, if the flag m was not yet set the program has now reached the "main" part, so m is set to 1. Otherwise, it was already in the main part, and any buffered empty lines need to be printed now (but the buffer cleared). If there were no buffered empty lines, buf will be empty and nothing printed, hence no harm done.
    • Then use sub() to strip the leading and trailing whitespace from the line.
    • The seemingly stray 1 outside of the action blocks instructs awk to print the current line including all modifications made.

This ensures that leading and trailing empty lines are omitted while keeping (but if necessary sanitizing) internal empty lines.

If you want to omit trimming of the non-empty lines, you can simply skip the two calls to sub().


This approach can be made more comfortable by placing the program in an awk script, say strip_wsp.awk:

#!/usr/bin/awk -f

NF==0{
    if (m) {buf=buf RS}
    next
}

{
    if (!m) {m=1} else {printf "%s",buf;buf=""}

    if (trimlines) {
	sub(/^[[:blank:]]*/,"")
	sub(/[[:blank:]]*$/,"")
    }
}

1

This version allows you to define an awk variable trimlines which, if set to non-zero, trims non-empty lines, but by default would leave them untouched:

Applied to a somewhat expanded version of your example string, the result is like this:

$ printf -- "\n \n hello\nmellow \n \n\nworld\n\n\n" | awk -v trimlines=1 -f strip_wsp.awk
hello
mellow


world
History
Why does this post require moderator attention?
You might want to add some details to your flag.

0 comment threads

+0
−0

I'll post this as an example of what I'm looking to do.

The following script:

import sys

a = sys.stdin.read()
b = a.strip()
c = map(lambda s: s.strip(), b.splitlines())

for s in c:
    print(s)

Will remove:

  • Whitespace at the beginning and end of the file or stream
  • Whitespace at the beginning and end of each line (except for the legitimate line ending, of course)
$ echo -e " hello\nmellow \nworld\n\n\n" | python trim.py | bat -A
───────┬─────────────
       │ STDIN
       │ Size: -
───────┼─────────────
   1   │ hello␊
   2   │ mellow␊
   3   │ world␊
───────┴─────────────

Caveats:

  • In practice, you would probably want to add this script to your shell's PATH to use it
  • It might be worth adding some CLI flags to control which whitespace exactly is removed
  • The performance (especially memory usage) of this is probably bad, it does not efficiently handle one line at a time (and maintain a "consecutive blanks" buffer for removing trailing whitespace)

This seems like such an obvious task that there must surely be a Unix program for it already. However I could not find anything better than a Python script or sed with a somewhat-complex regex.

History
Why does this post require moderator attention?
You might want to add some details to your flag.

1 comment thread

Somewhat complex and inefficient (3 comments)

Sign up to answer this question »