Communities

Writing
Writing
Codidact Meta
Codidact Meta
The Great Outdoors
The Great Outdoors
Photography & Video
Photography & Video
Scientific Speculation
Scientific Speculation
Cooking
Cooking
Electrical Engineering
Electrical Engineering
Judaism
Judaism
Languages & Linguistics
Languages & Linguistics
Software Development
Software Development
Mathematics
Mathematics
Christianity
Christianity
Code Golf
Code Golf
Music
Music
Physics
Physics
Linux Systems
Linux Systems
Power Users
Power Users
Tabletop RPGs
Tabletop RPGs
Community Proposals
Community Proposals
tag:snake search within a tag
answers:0 unanswered questions
user:xxxx search by author id
score:0.5 posts with 0.5+ score
"snake oil" exact phrase
votes:4 posts with 4+ votes
created:<1w created < 1 week ago
post_type:xxxx type of post
Search help
Notifications
Mark all as read See all your notifications »
Q&A

Comments on What unexpected things can happen if a user runs commands expecting a text file on input lacking a file-final newline?

Post

What unexpected things can happen if a user runs commands expecting a text file on input lacking a file-final newline?

+7
−0

It is often taught that in Unix/Linux text files should end with newline characters. The reason given (orally) to me by various sources was that "some commands (such as wc) assume or require a newline at the end of a text file". (Other commands I am aware of are: sed, read, cron/crontab, and (in terms of output:) sort.) This led me to ask the following question:

Which commands in Unix/Linux assume that text files end in a newline?

It seems that the proper answer to that question is something like this:

all commands that require a "text file" according to the "INPUT FILES" section of their POSIX specification

This is because POSIX-compliant text files should end in a newline character '\n' (U+0010). (The rationale has been discussed at length on Stack Overflow and is not the subject of this question.)

That is, if I understand correctly, if one only works within Unix & Linux, one only rarely encounters files with textual data that don't end in a newline. Furthermore, the statement that "some commands assume or require a newline at the end of a text file" is misleading, because in fact all commands that manipulate text files have such a requirement, simply because having a file-final newline is a POSIX requirement for text files.

With all that said, I would hereby like to ask:

What are some unexpected things that can happen if a Unix/Linux user forgets to check textual input to commands for file-final newlines?

To elaborate on "unexpected", let <lines> denote a sequence of POSIX lines and <def-line> a defective line (defined as the result of subtracting the final newline character from a non-empty POSIX line). In what follows, def-line thus represents the trailing characters after the input's last newline.

  • Let us assume that the expected behavior is
    f(<lines><def-line>) = f(<lines><def-line>\n) or
    f(<lines><def-line>) ≅ f(<lines><def-line>\n).
    Here, approximate equality () covers the case of "the same" newline being missing from the output on the left-hand side. Unexpected behavior would then be
    f(<lines><def-line>) ≠ f(<lines><def-line>\n) or
    f(<lines><def-line>) ≇ f(<lines><def-line>\n).
  • A special case of this is
    f(<lines><def-line>) = f(<lines>).

The answer needn't be a complete list; a list of common "gotchas" might be enough to teach a new user to be careful to check for text-file-final newlines. Historical gotchas are welcome; I am aware that commands change over time (and also exist in different implementations).


Here are some realistic sources of files that are POSIX-compliant text files, except for the fact that they are missing a file-final newline:

  • text files saved with various text editors that don't enforce a file-final newline, such as Sublime Text
  • code saved with IDEs that don't enforce a file-final newline
  • files with textual data from governments, used for research purposes
  • files from other OSes that were transferred in a way that doesn't append missing file-final newlines

Note also that word processors in general don't enforce document-final newlines, so users who are new to handling text files in Unix/Linux might find it surprising that text-file-final newlines are a POSIX requirement. They probably haven't given this issue much thought and will (if questioned) likely guess that a text file is either

  1. lines separated by newline characters or
  2. what POSIX defines, except with the last line's final newline character being optional.

It is easy to point users to the standard and state that behavior on malformed input is undefined, but in reality:

  • tools don't behave randomly on malformed input (they certainly don't normally erase your hard drive),
  • most users learn Unix/Linux by example and by trial-and-error, not by reading standards documents, and
  • many tools make allowances for malformed input, because it occurs in the wild – perhaps tools should emit warnings more often, but that is another topic.

I firmly believe that people should be using commands with properly formatted input, and I am not endorsing usage of POSIX commands in a non-POSIX-conformant way. Perhaps the actual issue is that POSIX conventions aren't taught well. However, a list of pitfalls might help casual users of Unix/Linux learn about what could go (or already might have gone) wrong when input isn't properly formatted.

History
Why does this post require moderator attention?
You might want to add some details to your flag.
Why should this post be closed?

3 comment threads

My question is verbose because it was first asked on *Stack Exchange: Unix & Linux*, where it was clo... (5 comments)
Agreed, but too much of a rant (3 comments)
Nothing bad happens, it's a made up problem (1 comment)
My question is verbose because it was first asked on *Stack Exchange: Unix & Linux*, where it was clo...

My question is verbose because it was first asked on Stack Exchange: Unix & Linux, where it was closed.

matthewsnyder‭ wrote 10 months ago

Uh, wait, how does being closed on SX cause it to be verbose? Is it because they nagged the original asker into adding a lot of detail? I think the whole point of this site is that the moderation policy is different than SX, so it might be better to ask the question the way it should have been asked if not for interference from SX users.

And btw I do think this is a lot of text just to ask "Will anything bad happen if I don't do terminal newlines?". For comparison, here's a similar question I asked, which is much shorter: https://linux.codidact.com/posts/288401 (Even that one is too long, IMO, but I was too lazy to shorten)

Well, some commenters argued that the original short version of my question was unanswerable because you'd have to check every implementation of every Unix/Linux command, because, after all, implementations are free to do whatever they want with malformed input. So I basically added justification for why someone might be wondering about this.

matthewsnyder‭ wrote 10 months ago · edited 10 months ago

some commenters argued that the original short version was unanswerable because you'd have to check every implementation of every Unix/Linux command

🙄

It sounds like your question would be satisfied with just some general rules of thumb + a few illustrative real world examples, and you're not really requesting a comprehensive inventory of every single possible issue, no?

Lover of Structure‭ wrote 10 months ago · edited 10 months ago

That would already be very much welcomed (but of course the more the better).