Recursively remove files with the same name as the ones that end in `.part`
I want to remove all files with the ".part" extension in the current directory and its subdirectories, including files with the same name but different extension.
Is this correct?
find . -name '*.part' -exec sh -c 'base="$(basename "$1" .part)"; find . -name "$base*" -delete' sh {} \;
3 answers
You are accessing this answer with a direct link, so it's being shown above all other answers regardless of its score. You can return to the normal view.
It is incorrect for two reasons.
1. File names containing glob characters
This is an edge case scenario.
Consider this structure:
.
├── abc
├── abc.part
├── cde
└── c*e.part
The outermost Find will find
-
abc.part
, sobase=abc
and the innermost Find looks for files matching the globabc*
, which matches theabc
file. Good. -
c*e.part
, sobase=c*e
and the innermost Find looks for files matching the globc*e*
, which matches thecde
file. Bad, becausecde
does not containc*e
.
2. File names with extra characters
If you have abcde
and abc.part
files, the former will be deleted because it matches abc*
as should be clear from the previous case discussion.
This particular problem would be easily fixed by changing $base*
-> $base.*
.
Proposed solution
Point 1 is the real challenge: It is quite involved to feed the file names back again into another Find's -name
argument and escape the meta-characters, which is always a mine field.
I propose instead to use a shell with support for **
, the recursive glob, for example Bash or Ksh with globstar
option set or Zsh.
#!/bin/bash
shopt -s globstar #Not needed in Zsh
for f in ./**/*.part; do
rm ./**/"$(basename "$f" .part)".*
done
For a breakdown,
- In line 2,
**/*.part
matches./a.part
but also./a/b/c.part
(hence "recursive glob"). - In line 3,
"$(basename "$f" .part)"
removes all directory components of the file name and its.part
extension. This would boil down toa
andc
in our example.
So the full linerm ./**/"$(basename "$f" .part)".*
recursively removes files matching thea.*
andc.*
patterns.
It is crucial not to quote the *
characters in the example, because we want it to act as a glob (and not to be parsed literally).
0 comment threads
I might be inclined to try...
find . -type f -name '*.part' -exec sh -c '
[ -f "${1%.part}" ] && rm -i -- "${1%.part}";
for f in "${1%.part}".*; do
[ -f "$f" ] && rm -i -- "$f";
done
' -- {} \;
(newlines for readability; can be elided if one-liner means something to you...)
-
find . -type f -name '*.part'
— find files ending with .part -
-exec sh -c '...' -- {} \;
— run a shell script ... for each found file; path to file is in $1 in child script -
"${1%.part}"
— strip .part from the end of the filename in $1 (same asbasename
but without the extra process) -
[ -f "${1%.part}" ] && ...;
— if a file exists with no extension, do the ... bit -
rm -i -- "${1%.part}"
— delete the file with no extension -
for f in "${1%.part}".*; do ... done
— loop each found path matching the filename with any extension; path is stored in $f (this includes the one with the .part extension) -
[ -f "$f" ] && ...;
— if the path in $f exists and is a file, do the ... bit -
rm -i -- "$f"
— remove the file in $f
Note that I'm using various checks that the thing I'm asking to delete is a file, not a directory, link, fifo, etc.
If limiting only to files is less of a concern, you might well be able to shorten this to...
find . -name '*.part' -exec sh -c 'rm -i -- "${1%.part}" "${1%.part}".*' -- {} \;
The shell may write errors if the args to rm
don't expand to existing paths, hide that with judicious use of 2>/dev/null
redirection, if you care.
For fewer subshells, you may be able to pass all found files to the same shell in one go, with...
find . -name '*.part' -exec sh -c 'while [ -n "$1" ]; do rm -i -- "${1%.part}" "${1%.part}".*; shift; done' -- {} \+
...but this might be painful for larger file lists.
In general, note there's is technically a race condition between the various tests and the eventual delete, but that's only a concern if multiple processes are acting on that directory tree. Not sure how to avoid that.
Finally, rm -i
is used to prompt y/n
for each file to delete, as a safety net. Remove the -i
switch from the rm
calls if you are confident.
For each file named foo.xyz
, you want to delete foo.xyz.part
. It doesn't matter if foo.xyz.part
exists, you can just attempt it and skip errors.
You can get a list of all files with find
etc. But you don't want the ones with .part
, so you use grep to take them out: find | grep -v '\.part$'
. $
means end of string and \.
is because otherwise .
means any character in regex.
You can then attempt to delete each one: find | grep -v '\.part$' | parallel rm {}
If the file doesn't exist, Parallel will show you the error message, but it will still delete the ones that do exist. You can do a bunch of extra filtering with comm
to only attempt to delete those files which do exist, but there's no need in this case.
0 comment threads