Communities

Writing
Writing
Codidact Meta
Codidact Meta
The Great Outdoors
The Great Outdoors
Photography & Video
Photography & Video
Scientific Speculation
Scientific Speculation
Cooking
Cooking
Electrical Engineering
Electrical Engineering
Judaism
Judaism
Languages & Linguistics
Languages & Linguistics
Software Development
Software Development
Mathematics
Mathematics
Christianity
Christianity
Code Golf
Code Golf
Music
Music
Physics
Physics
Linux Systems
Linux Systems
Power Users
Power Users
Tabletop RPGs
Tabletop RPGs
Community Proposals
Community Proposals
tag:snake search within a tag
answers:0 unanswered questions
user:xxxx search by author id
score:0.5 posts with 0.5+ score
"snake oil" exact phrase
votes:4 posts with 4+ votes
created:<1w created < 1 week ago
post_type:xxxx type of post
Search help
Notifications
Mark all as read See all your notifications »
Q&A

Comments on How to run a command on a list of files?

Parent

How to run a command on a list of files?

+4
−0

Suppose I have a list of files on standard input. These may be the output of find, cat filelist.txt or something else.

How can I run a command on each file in turn?

History
Why does this post require moderator attention?
You might want to add some details to your flag.
Why should this post be closed?

0 comment threads

Post
+4
−0
  • If I just used find to generate a list of files, then find's -exec argument is usually the way to run some other program on each file found.

    If you pipe the command to xargs, note that -P n will run up to n commands in parallel. The best value of n will depend on the relative usage of your CPU and your storage system.

  • If I have a program (say, generate_lists) that generates a list of files,

    for filename in $(generate_lists); do some_program "$filename" ; done
    

    is usually helpful. Make sure you quote your use of $filename -- more of them have spaces than you'd think.

History
Why does this post require moderator attention?
You might want to add some details to your flag.

3 comment threads

xargs(1) is usually better than looping (1 comment)
for loops on command substitutions have quoting issues (1 comment)
Usually "number of cores" is a good starting value for `n`. (2 comments)
Usually "number of cores" is a good starting value for `n`.
matthewsnyder‭ wrote 11 months ago

Usually "number of cores" is a good starting value for n.

dsr‭ wrote 11 months ago

Sometimes? It really depends. Here are some reasons why it wouldn't be:

  • Huge number of cores, low I/O: you have a 192 core box but the filesystem is mounted over a transPacific link. You might want n=1 or 2 or maybe 3.

  • Simple task, great I/O: you're going to stat each file and report on metadata, and they are all stored in a tmpfs. Even if you only have 2 cores, you might want to try n=4.

Those are extremes, of course. Most cases will be somewhere in between, and fairly often this is going to be a one-off where there is no point in re-running for performance gain -- you have the answer!