Version 2 - Historie - Bash find duplicate files - Fedora 32 - OMB Redmine

Bash find duplicate files » Historie » Version 2

Jeremias Keihsler, 02.09.2021 12:44

-Jeremias Keihsler
+h1. Bash find duplicate files
 This one-liner is taken from:
 http://www.commandlinefu.com/commands/view/3555/find-duplicate-files-based-on-size-first-then-md5-hash
 and had been explained at:
 http://heyrod.com/snippet/t/linux.html
 <pre><code class="bash">
 find -not -empty -type f -printf "%s\n" | sort -rn | uniq -d  | xargs -I{} -n1 find -type f -size {}c -print0 | xargs -0 md5sum | sort | uniq -w32 --all-repeated=separate
 </code></pre>
 the explanation is as following:
 <pre>
 $ find -not -empty -type f -printf "%s\n" | \
 > sort -rn | \
 > uniq -d | \
 > xargs -I{} -n1 find -type f -size {}c -print0 | \
 > xargs -0 md5sum | \
 > sort | \
 > uniq -w32 --all-repeated=separate | \
 > cut -d" " -f3-
 </pre>
 You probably want to pipe that to a file as it runs slowly.
 If I understand this correctly:
 Line 1 enumerates the real files non-empty by size.
 Line 2 sorts the sizes (as numbers of descending size).
 Line 3 strips out the lines (sizes) that only appear once.
 For each remaining size, line 4 finds all the files of that size.
 Line 5 computes the MD5 hash for all the files found in line 4, outputting the MD5 hash and file name. (This is repeated for each set of files of a given size.)
 Line 6 sorts that list for easy comparison.
 Line 7 compares the first 32 characters of each line (the MD5 hash) to find duplicates.
 Line 8 spits out the file name and path part of the matching lines.
 h2. fdupes
 Fdupes is another program that allows you to identify duplicate files on your system. It is free and open-source and written in C. It uses the following methods to determine duplicate files:
 * Comparing partial md5sum signatures
 * Comparing full md5sum signatures
 * byte-by-byte comparison verification
 Just like rdfind it has similar options:
 * Search recursively
 * Exclude empty files
 * Shows size of duplicate files
 * Delete duplicates immediately
 * Exclude files with a different owner
 To install fdupes in Linux, use the following command as per your Linux distribution.
 <pre><code class="shell">
 fdupes --recurse /dir1 /dir2
 </code></pre>
 Jeremias Keihsler
 to choose which files to keep, after selecting the files just press @<DEL>@
 <pre><code class="shell">
 fdupes --recurse -d /dir1 /dir2
 </code></pre>

Projekt

Allgemein

Profil

DokuWiki » Infrastructure » Operating System » Fedora 32

Bash find duplicate files » Historie » Version 2