Bash find duplicate files » Historie » Version 1
Jeremias Keihsler, 09.01.2025 15:28
1 | 1 | Jeremias Keihsler | h1. Bash find duplicate files |
---|---|---|---|
2 | |||
3 | This one-liner is taken from: |
||
4 | http://www.commandlinefu.com/commands/view/3555/find-duplicate-files-based-on-size-first-then-md5-hash |
||
5 | |||
6 | and had been explained at: |
||
7 | http://heyrod.com/snippet/t/linux.html |
||
8 | |||
9 | <pre><code class="bash"> |
||
10 | find -not -empty -type f -printf "%s\n" | sort -rn | uniq -d | xargs -I{} -n1 find -type f -size {}c -print0 | xargs -0 md5sum | sort | uniq -w32 --all-repeated=separate |
||
11 | </code></pre> |
||
12 | |||
13 | the explanation is as following: |
||
14 | <pre> |
||
15 | 1 $ find -not -empty -type f -printf "%s\n" | \ |
||
16 | 2 > sort -rn | \ |
||
17 | 3 > uniq -d | \ |
||
18 | 4 > xargs -I{} -n1 find -type f -size {}c -print0 | \ |
||
19 | 5 > xargs -0 md5sum | \ |
||
20 | 6 > sort | \ |
||
21 | 7 > uniq -w32 --all-repeated=separate | \ |
||
22 | 8 > cut -d" " -f3- |
||
23 | </pre> |
||
24 | |||
25 | You probably want to pipe that to a file as it runs slowly. |
||
26 | |||
27 | If I understand this correctly: |
||
28 | |||
29 | Line 1 enumerates the real files non-empty by size. |
||
30 | Line 2 sorts the sizes (as numbers of descending size). |
||
31 | Line 3 strips out the lines (sizes) that only appear once. |
||
32 | For each remaining size, line 4 finds all the files of that size. |
||
33 | Line 5 computes the MD5 hash for all the files found in line 4, outputting the MD5 hash and file name. (This is repeated for each set of files of a given size.) |
||
34 | Line 6 sorts that list for easy comparison. |
||
35 | Line 7 compares the first 32 characters of each line (the MD5 hash) to find duplicates. |
||
36 | Line 8 spits out the file name and path part of the matching lines. |
||
37 | |||
38 | h2. fdupes |
||
39 | |||
40 | Fdupes is another program that allows you to identify duplicate files on your system. It is free and open-source and written in C. It uses the following methods to determine duplicate files: |
||
41 | |||
42 | * Comparing partial md5sum signatures |
||
43 | * Comparing full md5sum signatures |
||
44 | * byte-by-byte comparison verification |
||
45 | |||
46 | Just like rdfind it has similar options: |
||
47 | |||
48 | * Search recursively |
||
49 | * Exclude empty files |
||
50 | * Shows size of duplicate files |
||
51 | * Delete duplicates immediately |
||
52 | * Exclude files with a different owner |
||
53 | |||
54 | To install fdupes in Linux, use the following command as per your Linux distribution. |
||
55 | |||
56 | <pre><code class="shell"> |
||
57 | dnf install fdupes |
||
58 | </code></pre> |
||
59 | |||
60 | |||
61 | <pre><code class="shell"> |
||
62 | fdupes --recurse /dir1 /dir2 |
||
63 | </code></pre> |
||
64 | |||
65 | to choose which files to keep, after selecting the files just press @<DEL>@ |
||
66 | |||
67 | <pre><code class="shell"> |
||
68 | fdupes --recurse -d /dir1 /dir2 |
||
69 | </code></pre> |
||
70 | |||
71 | cheat-sheet for selection |
||
72 | <pre> |
||
73 | selb <string> .. select all entries starting with <string> |
||
74 | ks ... mark selection to keep |
||
75 | ds ... mark selection to delete |
||
76 | csd ... clear selection |
||
77 | </pre> |