What is cat abuse/useless use of cat?
Sometimes I share Unix commands online, and people chastise me for "useless use of cat" (UUOC) or "cat abuse".
My cat is quite comfy and doing very well, thank you.
What are they talking about?
3 answers
UUOC is an ancient Unix yarn. I can't find the original essay (I believe from Usenet, where else...) but if memory serves it's either from early 90s or before.
cat
is actually a program for concatenating files. cat file1 file2 ...
will give you file1+file2+file3
. Together with split
, this is a primitive but effective system for partitioning and reassembling files. And when the files are text, it's a useful tool for transforming the text in various ways.
Of course cat file1
does nothing. It just gives the file back as is. Useful though concatenation is, many people rarely use it, and a lot of newbies acquire the misapprehension that cat file1
means "print file1
", and by the time they see cat
actually used with multiple arguments and discover the truth, the muscle memory has long set in.
Unix really does have a way to "print file1
" which is <file1
- the brother of "write to file1
" which is >file1
. See? Perfectly logical! Of course, grep cookies <file1 | wc
looks ridiculous, so even after learning this, people don't want to do it. Ditto for grep cookies file1
- yes, grep
can take input not only on STDIN... And yet, why bother, when cat file1 | grep cookies | wc
looks so much neater?
The problem with this so called UUOC (or "cat abuse", someone must have felt mighty clever coming up with that one) is that you run one more program. Another thing newbies don't realize is that programs in a Unix pipeline don't run one after the other - they run all at once and process input concurrently - just like a pipeline, see? So with UUOC you add an extra program, cat
, to the pipeline and waste CPU cycles. But if you did <file1
the OS reads it directly from the file, so you don't waste the CPU cycles, instead you waste brain cycles trying to remember what <
is and why it's pointing in the wrong direction.
How much CPU cycles? Actually very little. cat
is a very efficient program. If you're doing only cat file1
to see the file, your computer is a mega-turbo-overkill for doing that, cat
or not. You'll be bottlenecked by the speed of printing the characters out on your screen, not cat
. If you're using it as the first step of a processing pipeline, probably all the other programs are either simple in which case it's still mega-turbo-overkill (or hey, maybe just turbo-overkill), or they're heavy, and what counts as "heavy" for a computer today is a gazillion times more than the burden of running little old cat
, so it doesn't matter.
But people looove pointing out cat abuse. You gotta admit, it has a real satisfying feel to gotcha someone with that one. "What are you concatenating?" - ha ha! So often, the argument will go on to reveal more pearls of Unix wisdom:
- "But what if you were on a tiny computer, like an SBC?" indeed, if you're running pipelines on puny little micro-computers, mayhaps the humble
cat
will trip you up (aaah... đ). - "But what if the file was huge, and you needed to do seeks in it?" indeed, some programs like
less
are zippy and can skip right to line 813,745,714 of that mammoth file, without first going through the eight hundred thirteen million seven hundred forty five thousand seven hundred thirteen preceding lines. How clever! The next time you useless
to get the one line from the middle of your 5 petabyte diary, watch out for that dastardlycat
đź. - There are surely many others - mine own beard is but a few fingers' length, so surely there are wiser heads with greyer growths than me that see further and deeper.
But, probably you encountered this while asking for help on a forum. You posted some logs to show where the problem is. And you thought you'd be extra helpful and include your command, so they can tell exactly where you're getting the 1 MB log and what trivial filtering you're insulting your 16-core gigahertz processor's intelligence with. "They can reproduce it on their own machine that way!"
I choose to look past the semantics in such things, and examine the pragmatics. Is the helpful user really trying to micromanage your simple, lazy log read? Are they really trying to save you that 5 bytes of memory? Well, probably not. I mean, you think what, he's stupid? He knows it doesn't matter! He's trying to communicate something to you. Namely, that he is not stupid, but sharper, more keen of eye, more discerning of the smallest ripple on the boundless Ocean that is the modern Unix system. Recognize the mastery of this o-sensei and bask in his brilliance!
As for the cat
, it's probably fine, don't worry about it. Just don't tell the guy you're probably gonna keep doing it anyway.
1 comment thread
Overview
A "useless use" or "abuse" of cat
occurs when a Unix pipeline (sequence of commands that feed into each other, using the shell |
or "pipe" operator) includes a call to cat
that is unnecessary for solving the problem.
Such pipelines are naturally less efficient (since the OS has to start an additional cat
process) and require a bit more typing, but these things are unlikely to matter much in normal circumstances. The more interesting impact is social: some newer users may find the version of a pipeline with useless cat
s easier to understand, while more experienced users may find that it violates their aesthetic sense and betrays a lack of comfort with how the tools are intended to work. Learning how to remove unnecessary cat
uses from a pipeline can be a helpful exercise for improving one's shell programming skills.
What is cat
, and how is it used in pipelines?
The cat
program is used to concat
enate one or more sources of input - i.e., read the input sources and output all the content of each, one after the other in sequence. These sources can be either files or the standard input.
In particular, cat
can be given a single input which it will simply output on its standard output. (This is different from echo
, which will output content directly from its command line, rather than from sources named there.) Running cat
by itself will treat its standard input as the single input used; interactively, this results in a prompt where everything you type is simply echoed back to you (a second time, on top of the usual terminal echo) until you use control-D to mark the end of standard input (or force the program to quit with control-C or in some other way).
Why are the unnecessary uses unnecessary?
It's common to see pipelines which use cat
to open a single file and provide it to the standard input of another program, for example wc -l
. This is usually unnecessary because the other program already accepts filename arguments, and could open and read the file itself (and use that input instead of the standard input).
In more extreme cases, cat
by itself does nothing useful in a pipeline. It's like adding an extra length of pipe to a physical pipeline - it does nothing to change the logic of where or when the water flows, or how it splits off or rejoins. For example, cat | wc -l
is a more wasteful way to write wc -l
, because the latter would already read from standard input and count the lines.
0 comment threads
Especially in a pedagogical context, the issue with something like cat /dev/random | head -c 20
versus the more straightforward head -c 20 /dev/random
is that it communicates that extra ceremony is necessary. It isn't. Using one program instead of two isn't about saving kilobytes of computer memory; it's about saving human thought.
Different brains are going to work differently, and if yours is ridged such that you need that extra program in there, have a great time with that. But it's simply a fact that the cat
is useless in that contextâeven a program that only accepts input from stdin can be written fooprog </dev/random
âand most people in tech culture prefer to remove useless components from their tech.
0 comment threads