Version 1 - Historie - Bash find duplicate files - Fedora 36 - OMB Redmine

1

Jeremias Keihsler

h1. Bash find duplicate files

2

3

This one-liner is taken from:

4

http://www.commandlinefu.com/commands/view/3555/find-duplicate-files-based-on-size-first-then-md5-hash

5

6

and had been explained at:

7

http://heyrod.com/snippet/t/linux.html

8

9

<pre><code class="bash">

10

find -not -empty -type f -printf "%s\n" | sort -rn | uniq -d  | xargs -I{} -n1 find -type f -size {}c -print0 | xargs -0 md5sum | sort | uniq -w32 --all-repeated=separate

11

</code></pre>

12

13

the explanation is as following:

14

<pre>

15

1 $ find -not -empty -type f -printf "%s\n" | \

16

2 > sort -rn | \

17

3 > uniq -d | \

18

4 > xargs -I{} -n1 find -type f -size {}c -print0 | \

19

5 > xargs -0 md5sum | \

20

6 > sort | \

21

7 > uniq -w32 --all-repeated=separate | \

22

8 > cut -d" " -f3-

23

</pre>

24

25

You probably want to pipe that to a file as it runs slowly.

26

27

If I understand this correctly:

28

29

Line 1 enumerates the real files non-empty by size.

30

Line 2 sorts the sizes (as numbers of descending size).

31

Line 3 strips out the lines (sizes) that only appear once.

32

For each remaining size, line 4 finds all the files of that size.

33

Line 5 computes the MD5 hash for all the files found in line 4, outputting the MD5 hash and file name. (This is repeated for each set of files of a given size.)

34

Line 6 sorts that list for easy comparison.

35

Line 7 compares the first 32 characters of each line (the MD5 hash) to find duplicates.

36

Line 8 spits out the file name and path part of the matching lines.

37

38

h2. fdupes

39

40

Fdupes is another program that allows you to identify duplicate files on your system. It is free and open-source and written in C. It uses the following methods to determine duplicate files:

41

42

* Comparing partial md5sum signatures

43

* Comparing full md5sum signatures

44

* byte-by-byte comparison verification

45

46

Just like rdfind it has similar options:

47

48

* Search recursively

49

* Exclude empty files

50

* Shows size of duplicate files

51

* Delete duplicates immediately

52

* Exclude files with a different owner

53

54

To install fdupes in Linux, use the following command as per your Linux distribution.

55

56

<pre><code class="shell">

57

fdupes --recurse /dir1 /dir2

58

</code></pre>

59

60

to choose which files to keep, after selecting the files just press @<DEL>@

61

62

<pre><code class="shell">

63

fdupes --recurse -d /dir1 /dir2

64

</code></pre>

Projekt

Allgemein

Profil

DokuWiki » Infrastructure » Operating System » Fedora 36

Bash find duplicate files » Historie » Version 1