Comments on Higher-order functions in Bash?
Parent
Higher-order functions in Bash?
Say I have some Bash function my-func
, that expects a filename and does some processing on the corresponding file. For demonstration purposes,
my-func() { cat "$1"; }
If I want to apply that function to all the text files in the current directory, I eventually figured out that I can do:
export -f my-func
find . -name '*.txt' -print0 | xargs -0 -I{} bash -c 'my-func "{}"' _
Now suppose I want to generalize this process. I want to make another function, where I can pass my-func
(or some other function name, or command, or alias - the key feature is that this will expect only one argument, a filename forwarded from xargs
) followed by the arguments that find
should use to choose the files to process.
That is, I'd like to be able to define apply-to-files
, such that I can call e.g. apply-to-files my-func . -name '*.txt'
and have it do the right thing (in this case, cat
each file found by find . -name '*.txt'
).
What does that look like? My first thought is
apply-to-files() {
find "${@:2}" -print0 | xargs -0 -I{} bash -c '$1 "{}"' _
}
but that doesn't seem right. I'm getting lost in the quoting, and I'm confused about which arguments are coming from where. And indeed, it doesn't seem to work; it appears to be treating the filenames from find
as the executable name to run, causing a bunch of "Permission denied" errors. Finally, this setup is assuming that the named operation will be a Bash function; I'd prefer if it could work transparently with (executables, aliases, shell builtins, ...) as well.
Post
The following users marked this post as Works for me:
User | Comment | Date |
---|---|---|
Karl Knechtel | (no comment) | Sep 23, 2023 at 21:46 |
-
Do not embed
{}
in the shell code. If your{}
gets expanded (byxargs
) to./rogue name $(reboot).txt
thenbash -c 'my-func "{}"' _
will becomebash -c 'my-func "./rogue name $(reboot).txt"' _
andreboot
will be executed. Pathnames containing"
will also be problematic. The right way is to pass what{}
expands to as argument(s) tobash
, not inside the code string. -
You don't need
xargs
.find … -exec …
will do.
This is how you can do what you want:
# this is what you already have, I only added the double-dash to prevent cat from ever interpreting the argument as option(s)
my-func() { cat -- "$1"; }
export -f my-func
# this is the solution
apply-to-files() {
find "${@:2}" -exec bash -c '"$1" "$2"' find-bash "$1" {} \;
}
Note the solution will work not only for your exported functions (e.g. apply-to-files my-func . -name '*.txt'
) but also for executables (e.g. apply-to-files stat . -name '*.txt'
).
This is how it works:
-
When you run
apply-to-files foo …
, the function will behave as if you run:find … -exec bash -c '"$1" "$2"' find-bash foo {} \;
where
$1
and$2
stay literal because for the shell interpreting the function (that has just expanded"${@:2}"
and (another)"$1"
) they were single-quoted (the outer quotes matter). -
For any file that passes tests included in
…
,find
will runbash -c …
as if you typed:bash -c '"$1" "$2"' find-bash foo pathname_of_file_found
-
This
bash
will execute the code being"$1" "$2"
with$0
beingfind-bash
,$1
beingfoo
and$2
beingpathname_of_file_found
. So it will execute as if you typed:"$1" "$2"
in a shell with the respective variables set accordingly.
-
After expansion the command will be like:
foo pathname_of_file_found
The important thing is this inner bash
knows that foo
and pathname_of_file_found
came from the expansion of the respective variables (because it has just expanded the variables by itself), so it will not try to (re)interpret them as shell code. There will be exactly two words. Spaces, quotes, dollar signs, semicolons etc. may appear inside the pathname and they won't break anything. (They will also be harmless inside the first argument to apply-to-files
if you ever pass such argument.)
On the other hand, if you allowed the outer shell (i.e. the shell you run apply-to-files
in) or find
(or xargs
) to expand something and embed the result in the shell code, then the result would be interpreted as code by the inner shell and this would create code injection vulnerability. This is the point I started this answer with.
Always build bash -c …
(or sh -c …
) so the shell code is static (unless the variable part is carefully formatted to be safely interpreted as shell code; there are robust ways, but I won't elaborate). In our case the code the inner bash gets is always "$1" "$2"
, the variable things are passed as positional parameters and this is the right way. Passing as environment variables is also fine, but not really useful here (well, it is useful here under the hood: your exported function is in fact passed as an environment variable; try env | grep my-func
in the outer shell; the inner shell creates a function out of it).
0 comment threads