Bash find duplicate files¶
This one-liner is taken from:
http://www.commandlinefu.com/commands/view/3555/find-duplicate-files-based-on-size-first-then-md5-hash
and had been explained at:
http://heyrod.com/snippet/t/linux.html
find -not -empty -type f -printf "%s\n" | sort -rn | uniq -d | xargs -I{} -n1 find -type f -size {}c -print0 | xargs -0 md5sum | sort | uniq -w32 --all-repeated=separate
the explanation is as following:
1 $ find -not -empty -type f -printf "%s\n" | \ 2 > sort -rn | \ 3 > uniq -d | \ 4 > xargs -I{} -n1 find -type f -size {}c -print0 | \ 5 > xargs -0 md5sum | \ 6 > sort | \ 7 > uniq -w32 --all-repeated=separate | \ 8 > cut -d" " -f3-
You probably want to pipe that to a file as it runs slowly.
If I understand this correctly:
Line 1 enumerates the real files non-empty by size.
Line 2 sorts the sizes (as numbers of descending size).
Line 3 strips out the lines (sizes) that only appear once.
For each remaining size, line 4 finds all the files of that size.
Line 5 computes the MD5 hash for all the files found in line 4, outputting the MD5 hash and file name. (This is repeated for each set of files of a given size.)
Line 6 sorts that list for easy comparison.
Line 7 compares the first 32 characters of each line (the MD5 hash) to find duplicates.
Line 8 spits out the file name and path part of the matching lines.
fdupes¶
Fdupes is another program that allows you to identify duplicate files on your system. It is free and open-source and written in C. It uses the following methods to determine duplicate files:
- Comparing partial md5sum signatures
- Comparing full md5sum signatures
- byte-by-byte comparison verification
Just like rdfind it has similar options:
- Search recursively
- Exclude empty files
- Shows size of duplicate files
- Delete duplicates immediately
- Exclude files with a different owner
To install fdupes in Linux, use the following command as per your Linux distribution.
fdupes --recurse /dir1 /dir2
to choose which files to keep, after selecting the files just press <DEL>
fdupes --recurse -d /dir1 /dir2
Von Jeremias Keihsler vor etwa 3 Jahren aktualisiert · 2 Revisionen