Projekt

Allgemein

Profil

Bash find duplicate files » Historie » Version 1

Jeremias Keihsler, 24.07.2022 10:07

1 1 Jeremias Keihsler
h1. Bash find duplicate files
2
3
This one-liner is taken from:
4
http://www.commandlinefu.com/commands/view/3555/find-duplicate-files-based-on-size-first-then-md5-hash
5
6
and had been explained at:
7
http://heyrod.com/snippet/t/linux.html
8
9
<pre><code class="bash">
10
find -not -empty -type f -printf "%s\n" | sort -rn | uniq -d  | xargs -I{} -n1 find -type f -size {}c -print0 | xargs -0 md5sum | sort | uniq -w32 --all-repeated=separate
11
</code></pre>
12
13
the explanation is as following:
14
<pre>
15
1 $ find -not -empty -type f -printf "%s\n" | \
16
2 > sort -rn | \
17
3 > uniq -d | \
18
4 > xargs -I{} -n1 find -type f -size {}c -print0 | \
19
5 > xargs -0 md5sum | \
20
6 > sort | \
21
7 > uniq -w32 --all-repeated=separate | \
22
8 > cut -d" " -f3-
23
</pre>
24
25
You probably want to pipe that to a file as it runs slowly.
26
27
If I understand this correctly:
28
29
Line 1 enumerates the real files non-empty by size.
30
Line 2 sorts the sizes (as numbers of descending size).
31
Line 3 strips out the lines (sizes) that only appear once.
32
For each remaining size, line 4 finds all the files of that size.
33
Line 5 computes the MD5 hash for all the files found in line 4, outputting the MD5 hash and file name. (This is repeated for each set of files of a given size.)
34
Line 6 sorts that list for easy comparison.
35
Line 7 compares the first 32 characters of each line (the MD5 hash) to find duplicates.
36
Line 8 spits out the file name and path part of the matching lines.
37
38
h2. fdupes
39
40
Fdupes is another program that allows you to identify duplicate files on your system. It is free and open-source and written in C. It uses the following methods to determine duplicate files:
41
42
* Comparing partial md5sum signatures
43
* Comparing full md5sum signatures
44
* byte-by-byte comparison verification
45
46
Just like rdfind it has similar options:
47
48
* Search recursively
49
* Exclude empty files
50
* Shows size of duplicate files
51
* Delete duplicates immediately
52
* Exclude files with a different owner
53
54
To install fdupes in Linux, use the following command as per your Linux distribution.
55
56
<pre><code class="shell">
57
fdupes --recurse /dir1 /dir2
58
</code></pre>
59
60
to choose which files to keep, after selecting the files just press @<DEL>@
61
62
<pre><code class="shell">
63
fdupes --recurse -d /dir1 /dir2
64
</code></pre>