Communities

Writing

Codidact Meta

The Great Outdoors

Photography & Video

Scientific Speculation

Cooking

Electrical Engineering

Judaism

Languages & Linguistics

$Mathematics$

tag:snake search within a tag

answers:0 unanswered questions

user:xxxx search by author id

score:0.5 posts with 0.5+ score

"snake oil" exact phrase

votes:4 posts with 4+ votes

created:<1w created < 1 week ago

post_type:xxxx type of post

Search help

Notifications

Mark all as read See all your notifications »

Q&A Meta

Q&A

Posts Tags Edits

Ask Question

Comments on Why does the file command fail to recognize non-text files as such?

Parent

Why does the file command fail to recognize non-text files as such?

+3

−0

POSIX defines

Text file as

A file that contains characters organized into zero or more lines. The lines do not contain NUL characters and none can exceed {LINE_MAX} bytes in length, including the <newline> character.
Line as

A sequence of zero or more non- <newline> characters plus a terminating <newline> character.
Character as

A sequence of one or more bytes representing a single graphic symbol or control code.

Consider then six files, each with two bytes, created with these Printf commands (using octals):

printf "\101\012" > file1 #A<newline>
printf "\010\012" > file2 #<backspace><newline>
printf "\101\101" > file3 #AA
printf "\200\012" > file4
printf "\200\200" > file5
printf "\000\012" > file6 #<null><newline>

Now, in the UTF-8 encoding, the octal 012 (0x0A) is the newline character, 101 (0x41) is the graphic symbol A, 010 (0x08) is the backspace control character and 200 (0x80) is a continuation byte that never occurs as the first byte of a multi-byte sequence, so it does not form a valid character.

Hence, I would regard files 1 and 2 as text files, but the remaining as non-text files, because file 3 is not newline terminated, files 4 and 5 have an invalid character and file 6 contains a null byte.

However, the file command does not seem to completely agree with me; it lists files 3, 4 and 5 as text files,

$ file --mime-type file*
file1: text/plain
file2: text/plain
file3: text/plain
file4: text/plain
file5: text/plain
file6: application/octet-stream

Why does the file command fail to identify files 3, 4 and 5 as non-text files (I'm assuming it can't possibly be a bug) even though I use en_US.UTF-8 as my locale, or else what did I incorrectly understand?

posix file

posted about 4 years ago

CC BY-SA 4.0

4y ago

Quasímodo‭

906 reputation 15 24 114 40

Raw

Markdown

History

is a duplicate

This question has been asked before and has already been answered. It should be marked as a duplicate.

Please enter the URL of the proposed duplicate in the details field below.

not constructive

This question cannot be answered in a way that is helpful to anyone. It's not possible to learn something from possible answers, except for the solution for the specific problem of the asker.