Recursively remove files with the same name as the ones that end in `.part`
I want to remove all files with the ".part" extension in the current directory and its subdirectories, including files with the same name but different extension.
Is this correct?
find . -name '*.part' -exec sh -c 'base="$(basename "$1" .part)"; find . -name "$base*" -delete' sh {} \;
3 answers
It is incorrect for two reasons.
1. File names containing glob characters
This is an edge case scenario.
Consider this structure:
.
├── abc
├── abc.part
├── cde
└── c*e.part
The outermost Find will find
-
abc.part
, sobase=abc
and the innermost Find looks for files matching the globabc*
, which matches theabc
file. Good. -
c*e.part
, sobase=c*e
and the innermost Find looks for files matching the globc*e*
, which matches thecde
file. Bad, becausecde
does not containc*e
.
2. File names with extra characters
If you have abcde
and abc.part
files, the former will be deleted because it matches abc*
as should be clear from the previous case discussion.
This particular problem would be easily fixed by changing $base*
-> $base.*
.
Proposed solution
Point 1 is the real challenge: It is quite involved to feed the file names back again into another Find's -name
argument and escape the meta-characters, which is always a mine field.
I propose instead to use a shell with support for **
, the recursive glob, for example Bash or Ksh with globstar
option set or Zsh.
#!/bin/bash
shopt -s globstar #Not needed in Zsh
for f in ./**/*.part; do
rm ./**/"$(basename "$f" .part)".*
done
For a breakdown,
- In line 2,
**/*.part
matches./a.part
but also./a/b/c.part
(hence "recursive glob"). - In line 3,
"$(basename "$f" .part)"
removes all directory components of the file name and its.part
extension. This would boil down toa
andc
in our example.
So the full linerm ./**/"$(basename "$f" .part)".*
recursively removes files matching thea.*
andc.*
patterns.
It is crucial not to quote the *
characters in the example, because we want it to act as a glob (and not to be parsed literally).
0 comment threads
I might be inclined to try...
find . -type f -name '*.part' -exec sh -c '
[ -f "${1%.part}" ] && rm -i -- "${1%.part}";
for f in "${1%.part}".*; do
[ -f "$f" ] && rm -i -- "$f";
done
' -- {} \;
(newlines for readability; can be elided if one-liner means something to you...)
-
find . -type f -name '*.part'
— find files ending with .part -
-exec sh -c '...' -- {} \;
— run a shell script ... for each found file; path to file is in $1 in child script -
"${1%.part}"
— strip .part from the end of the filename in $1 (same asbasename
but without the extra process) -
[ -f "${1%.part}" ] && ...;
— if a file exists with no extension, do the ... bit -
rm -i -- "${1%.part}"
— delete the file with no extension -
for f in "${1%.part}".*; do ... done
— loop each found path matching the filename with any extension; path is stored in $f (this includes the one with the .part extension) -
[ -f "$f" ] && ...;
— if the path in $f exists and is a file, do the ... bit -
rm -i -- "$f"
— remove the file in $f
Note that I'm using various checks that the thing I'm asking to delete is a file, not a directory, link, fifo, etc.
If limiting only to files is less of a concern, you might well be able to shorten this to...
find . -name '*.part' -exec sh -c 'rm -i -- "${1%.part}" "${1%.part}".*' -- {} \;
The shell may write errors if the args to rm
don't expand to existing paths, hide that with judicious use of 2>/dev/null
redirection, if you care.
For fewer subshells, you may be able to pass all found files to the same shell in one go, with...
find . -name '*.part' -exec sh -c 'while [ -n "$1" ]; do rm -i -- "${1%.part}" "${1%.part}".*; shift; done' -- {} \+
...but this might be painful for larger file lists.
In general, note there's is technically a race condition between the various tests and the eventual delete, but that's only a concern if multiple processes are acting on that directory tree. Not sure how to avoid that.
Finally, rm -i
is used to prompt y/n
for each file to delete, as a safety net. Remove the -i
switch from the rm
calls if you are confident.
For each file named foo.xyz
, you want to delete foo.xyz.part
. It doesn't matter if foo.xyz.part
exists, you can just attempt it and skip errors.
You can get a list of all files with find
etc. But you don't want the ones with .part
, so you use grep to take them out: find | grep -v '\.part$'
. $
means end of string and \.
is because otherwise .
means any character in regex.
You can then attempt to delete each one: find | grep -v '\.part$' | parallel rm {}
If the file doesn't exist, Parallel will show you the error message, but it will still delete the ones that do exist. You can do a bunch of extra filtering with comm
to only attempt to delete those files which do exist, but there's no need in this case.
0 comment threads