Projekt

Allgemein

Profil

Aktionen

Bash find duplicate files » Historie » Revision 1

Revision 1/2 | Weiter »
Jeremias Keihsler, 10.04.2021 23:44


Bash find duplicate files

This one-liner is taken from:
http://www.commandlinefu.com/commands/view/3555/find-duplicate-files-based-on-size-first-then-md5-hash

and had been explained at:
http://heyrod.com/snippet/t/linux.html

find -not -empty -type f -printf "%s\n" | sort -rn | uniq -d  | xargs -I{} -n1 find -type f -size {}c -print0 | xargs -0 md5sum | sort | uniq -w32 --all-repeated=separate

the explanation is as following:

1 $ find -not -empty -type f -printf "%s\n" | \
2 > sort -rn | \
3 > uniq -d | \
4 > xargs -I{} -n1 find -type f -size {}c -print0 | \
5 > xargs -0 md5sum | \
6 > sort | \
7 > uniq -w32 --all-repeated=separate | \
8 > cut -d" " -f3-

You probably want to pipe that to a file as it runs slowly.

If I understand this correctly:

Line 1 enumerates the real files non-empty by size.
Line 2 sorts the sizes (as numbers of descending size).
Line 3 strips out the lines (sizes) that only appear once.
For each remaining size, line 4 finds all the files of that size.
Line 5 computes the MD5 hash for all the files found in line 4, outputting the MD5 hash and file name. (This is repeated for each set of files of a given size.)
Line 6 sorts that list for easy comparison.
Line 7 compares the first 32 characters of each line (the MD5 hash) to find duplicates.
Line 8 spits out the file name and path part of the matching lines.

fdupes

Fdupes is another program that allows you to identify duplicate files on your system. It is free and open-source and written in C. It uses the following methods to determine duplicate files:

  • Comparing partial md5sum signatures
  • Comparing full md5sum signatures
  • byte-by-byte comparison verification

Just like rdfind it has similar options:

  • Search recursively
  • Exclude empty files
  • Shows size of duplicate files
  • Delete duplicates immediately
  • Exclude files with a different owner

To install fdupes in Linux, use the following command as per your Linux distribution.

fdupes --recurse /dir1 /dir2

Von Jeremias Keihsler vor mehr als 3 Jahren aktualisiert · 1 Revisionen