Communities

Writing
Writing
Codidact Meta
Codidact Meta
The Great Outdoors
The Great Outdoors
Photography & Video
Photography & Video
Scientific Speculation
Scientific Speculation
Cooking
Cooking
Electrical Engineering
Electrical Engineering
Judaism
Judaism
Languages & Linguistics
Languages & Linguistics
Software Development
Software Development
Mathematics
Mathematics
Christianity
Christianity
Code Golf
Code Golf
Music
Music
Physics
Physics
Linux Systems
Linux Systems
Power Users
Power Users
Tabletop RPGs
Tabletop RPGs
Community Proposals
Community Proposals
tag:snake search within a tag
answers:0 unanswered questions
user:xxxx search by author id
score:0.5 posts with 0.5+ score
"snake oil" exact phrase
votes:4 posts with 4+ votes
created:<1w created < 1 week ago
post_type:xxxx type of post
Search help
Notifications
Mark all as read See all your notifications »
Q&A

Comments on Higher-order functions in Bash?

Parent

Higher-order functions in Bash?

+2
−0

Say I have some Bash function my-func, that expects a filename and does some processing on the corresponding file. For demonstration purposes,

my-func() { cat "$1"; }

If I want to apply that function to all the text files in the current directory, I eventually figured out that I can do:

export -f my-func
find . -name '*.txt' -print0 | xargs -0 -I{} bash -c 'my-func "{}"' _

Now suppose I want to generalize this process. I want to make another function, where I can pass my-func (or some other function name, or command, or alias - the key feature is that this will expect only one argument, a filename forwarded from xargs) followed by the arguments that find should use to choose the files to process.

That is, I'd like to be able to define apply-to-files, such that I can call e.g. apply-to-files my-func . -name '*.txt' and have it do the right thing (in this case, cat each file found by find . -name '*.txt').

What does that look like? My first thought is

apply-to-files() {
    find "${@:2}" -print0 | xargs -0 -I{} bash -c '$1 "{}"' _
}

but that doesn't seem right. I'm getting lost in the quoting, and I'm confused about which arguments are coming from where. And indeed, it doesn't seem to work; it appears to be treating the filenames from find as the executable name to run, causing a bunch of "Permission denied" errors. Finally, this setup is assuming that the named operation will be a Bash function; I'd prefer if it could work transparently with (executables, aliases, shell builtins, ...) as well.

History
Why does this post require attention from curators or moderators?
You might want to add some details to your flag.
Why should this post be closed?

0 comment threads

Post
+4
−0
  1. Do not embed {} in the shell code. If your {} gets expanded (by xargs) to ./rogue name $(reboot).txt then bash -c 'my-func "{}"' _ will become bash -c 'my-func "./rogue name $(reboot).txt"' _ and reboot will be executed. Pathnames containing " will also be problematic. The right way is to pass what {} expands to as argument(s) to bash, not inside the code string.

  2. You don't need xargs. find … -exec … will do.

This is how you can do what you want:

# this is what you already have, I only added the double-dash to prevent cat from ever interpreting the argument as option(s)
my-func() { cat -- "$1"; }
export -f my-func

# this is the solution
apply-to-files() {
    find "${@:2}" -exec bash -c '"$1" "$2"' find-bash "$1" {} \;
}

Note the solution will work not only for your exported functions (e.g. apply-to-files my-func . -name '*.txt') but also for executables (e.g. apply-to-files stat . -name '*.txt').

This is how it works:

  1. When you run apply-to-files foo …, the function will behave as if you run:

    find … -exec bash -c '"$1" "$2"' find-bash foo {} \;
    

    where $1 and $2 stay literal because for the shell interpreting the function (that has just expanded "${@:2}" and (another) "$1") they were single-quoted (the outer quotes matter).

  2. For any file that passes tests included in , find will run bash -c … as if you typed:

    bash -c '"$1" "$2"' find-bash foo pathname_of_file_found
    
  3. This bash will execute the code being "$1" "$2" with $0 being find-bash, $1 being foo and $2 being pathname_of_file_found. So it will execute as if you typed:

    "$1" "$2"
    

    in a shell with the respective variables set accordingly.

  4. After expansion the command will be like:

    foo pathname_of_file_found
    

The important thing is this inner bash knows that foo and pathname_of_file_found came from the expansion of the respective variables (because it has just expanded the variables by itself), so it will not try to (re)interpret them as shell code. There will be exactly two words. Spaces, quotes, dollar signs, semicolons etc. may appear inside the pathname and they won't break anything. (They will also be harmless inside the first argument to apply-to-files if you ever pass such argument.)

On the other hand, if you allowed the outer shell (i.e. the shell you run apply-to-files in) or find (or xargs) to expand something and embed the result in the shell code, then the result would be interpreted as code by the inner shell and this would create code injection vulnerability. This is the point I started this answer with.

Always build bash -c … (or sh -c …) so the shell code is static (unless the variable part is carefully formatted to be safely interpreted as shell code; there are robust ways, but I won't elaborate). In our case the code the inner bash gets is always "$1" "$2", the variable things are passed as positional parameters and this is the right way. Passing as environment variables is also fine, but not really useful here (well, it is useful here under the hood: your exported function is in fact passed as an environment variable; try env | grep my-func in the outer shell; the inner shell creates a function out of it).

History
Why does this post require attention from curators or moderators?
You might want to add some details to your flag.

1 comment thread

Some details (3 comments)
Some details
Karl Knechtel‭ wrote about 1 year ago

This is brilliant; I had a feeling (after realizing the issue in my first attempt at the question) that xargs was complicating things given that find has a -exec argument. This really allowed me to get my head around the needed quoting schemes, too, and explains general best practices (I am not at all surprised the the dangers/vulnerabilities you mentioned).

To clarify, in the example setting $0 as find-bash is entirely arbitrary? Also, I checked the env settings you mentioned; it appears that the entries get special names, and they're also tokenized and then reconstructed. Neat.

Karl Knechtel‭ wrote about 1 year ago

I realize now that I had originally been thinking in terms of xargs because of a previous wrapper I made, to identify files to tally with wc (which needs a single invocation with all the arguments, in order to get the sum, of course). I went back and fixed that to use -print0 / -0 at least.