Bash find duplicate files » Historie » Version 1
Jeremias Keihsler, 13.01.2017 10:43
| 1 | 1 | Jeremias Keihsler | This one-liner is taken from: |
|---|---|---|---|
| 2 | http://www.commandlinefu.com/commands/view/3555/find-duplicate-files-based-on-size-first-then-md5-hash |
||
| 3 | |||
| 4 | and had been explained at: |
||
| 5 | http://heyrod.com/snippet/t/linux.html |
||
| 6 | |||
| 7 | <pre><code class="bash"> |
||
| 8 | find -not -empty -type f -printf "%s\n" | sort -rn | uniq -d | xargs -I{} -n1 find -type f -size {}c -print0 | xargs -0 md5sum | sort | uniq -w32 --all-repeated=separate |
||
| 9 | </code></pre> |
||
| 10 | |||
| 11 | the explanation is as following: |
||
| 12 | <pre> |
||
| 13 | 1 $ find -not -empty -type f -printf "%s\n" | \ |
||
| 14 | 2 > sort -rn | \ |
||
| 15 | 3 > uniq -d | \ |
||
| 16 | 4 > xargs -I{} -n1 find -type f -size {}c -print0 | \ |
||
| 17 | 5 > xargs -0 md5sum | \ |
||
| 18 | 6 > sort | \ |
||
| 19 | 7 > uniq -w32 --all-repeated=separate | \ |
||
| 20 | 8 > cut -d" " -f3- |
||
| 21 | </pre> |
||
| 22 | |||
| 23 | You probably want to pipe that to a file as it runs slowly. |
||
| 24 | |||
| 25 | If I understand this correctly: |
||
| 26 | |||
| 27 | Line 1 enumerates the real files non-empty by size. |
||
| 28 | Line 2 sorts the sizes (as numbers of descending size). |
||
| 29 | Line 3 strips out the lines (sizes) that only appear once. |
||
| 30 | For each remaining size, line 4 finds all the files of that size. |
||
| 31 | Line 5 computes the MD5 hash for all the files found in line 4, outputting the MD5 hash and file name. (This is repeated for each set of files of a given size.) |
||
| 32 | Line 6 sorts that list for easy comparison. |
||
| 33 | Line 7 compares the first 32 characters of each line (the MD5 hash) to find duplicates. |
||
| 34 | Line 8 spits out the file name and path part of the matching lines. |