How to run a command on a list of files?

−0

Suppose I have a list of files on standard input. These may be the output of find, cat filelist.txt or something else.

How can I run a command on each file in turn?

shell-scripting

posted almost 2 years ago

CC BY-SA 4.0

matthewsnyder‭

1881 reputation 125 77 298 110

Raw

Markdown

History

is a duplicate

This question has been asked before and has already been answered. It should be marked as a duplicate.

Please enter the URL of the proposed duplicate in the details field below.

not constructive

This question cannot be answered in a way that is helpful to anyone. It's not possible to learn something from possible answers, except for the solution for the specific problem of the asker.

0 comment threads

3 answers

Score Active Age

−0

There are several options, like xargs and for. I'll leave those for other answers and only describe my favorite, GNU Parallel. You will have to install it separately, because unlike the inferior xargs it usually does not come pre-installed.

Parallel can take a list of files on STDIN and run a command on them.

# Count lines in each file under current directory using "wc FILENAME"
find | parallel wc {}

Actually, it can also take the list from other sources, like CLI arguments.

# Apply wc to only txt files in current dir (not subdirs)
parallel wc {} ::: *.txt

It supports various ways of manipulating the filename:

# Rename foo.JPG files to foo.jpg
find | grep '.JPG$' | parallel mv {} {.}.jpg

As you can see, you can do any kind of manipulation you want on the file list before you pipe it to Parallel.

Parallel also supports many other features like parallelization, grouping output with color, a dry run mode and so on. See man parallel for these.

Note that normally, these tools assume 1 file per line (delimited by \n). If you have exotic filenames, such as those with \n in the name, this won't work right. There are various workarounds for this, for example GNU find -print0, fd --print0 or parallel --null. You can find these in the man pages of the tools you use.

I personally prefer a workaround called "don't put newlines in filenames, and if you find any, delete them on sight, uninstall the program that made them, and yell at the developer in the issue tracker".

posted almost 2 years ago

CC BY-SA 4.0

2y ago

matthewsnyder‭

1881 reputation 125 77 298 110

Copy Link

Raw

Markdown

History

2 comment threads

Parallel sounds nice! (1 comment)

You may want to add a warning that parsing the output of `find` is strongly discouraged, because it w... (3 comments)

−0

Worked for LAFK‭

The following users marked this post as Works for me:

User	Comment	Date
LAFK‭	(no comment)	Jun 21, 2023 at 05:59

If I just used find to generate a list of files, then find's -exec argument is usually the way to run some other program on each file found.

If you pipe the command to xargs, note that -P n will run up to n commands in parallel. The best value of n will depend on the relative usage of your CPU and your storage system.
If I have a program (say, generate_lists) that generates a list of files,
```
for filename in $(generate_lists); do some_program "$filename" ; done
```
is usually helpful. Make sure you quote your use of $filename -- more of them have spaces than you'd think.

posted almost 2 years ago

CC BY-SA 4.0

2y ago by AdminBee‭

dsr‭

269 reputation 1 6 29 1

Copy Link

Raw

Markdown

History

3 comment threads

xargs(1) is usually better than looping (1 comment)

for loops on command substitutions have quoting issues (1 comment)

Usually "number of cores" is a good starting value for `n`. (2 comments)

−0

In some cases, when you want to apply a pipeline or a complex command to each file, I find it useful to use while read:

find . -type d \
| while read d; do
	find $d -type f -maxdepth 1 \
	| head -n3;
done;

Hardened version:

find . -type d -print0 \
| while IFS= read -r -d '' d; do
	find "$d" -type f -maxdepth 1 -print0 \
	| head -z -n3;
done;

Portable filenames should only use characters from the portable filename character set https://pubs.opengroup.org/onlinepubs/9699919799/basedefs/V1_chap03.html#tag_03_282. But if you need to run your script in filesystems that may be compromised, or use non-portable file names for other reasons, use the hardened version.

Read the corresponding documentation of the features:

find(1):
- -print0 https://www.man7.org/linux/man-pages/man1/find.1.html#EXPRESSION
sh(1):
- IFS= https://www.man7.org/linux/man-pages/man1/sh.1p.html#RATIONALE
read(1):
- -r https://www.man7.org/linux/man-pages/man1/read.1p.html#OPTIONS
- -d '' https://stackoverflow.com/questions/9612090/how-to-loop-through-file-names-returned-by-find#comment98776168_9612232
head(1):
- -z https://www.man7.org/linux/man-pages/man1/head.1.html#DESCRIPTION

(The above shows the first 3 files in each directory.)

However, this forces an invocation of the command or pipeline for each path name. In general, xargs(1) is the simplest (and usually fastest) way:

find . -type f \
| xargs mv -t /tmp;

Hardened version:

find . -type f -print0 \
| xargs -0 mv -t /tmp;

(The above moves all files in . to /tmp.)

xargs(1):
- -0 https://www.man7.org/linux/man-pages/man1/xargs.1.html#OPTIONS

posted over 1 year ago

CC BY-SA 4.0

1y ago

alx‭

367 reputation 8 17 42 113

Copy Link

Raw

Markdown

History

Communities

How to run a command on a list of files?

0 comment threads

3 answers

2 comment threads

3 comment threads

0 comment threads