Finding and deleting duplicate files
If you need to recover backups or you use your laptop in a disconnected mode or download images from a phone, you'll eventually end up with duplicates: files with the same content. You'll probably want to remove duplicate files and keep a single copy. We can identify duplicate files by examining the content with shell utilities. This recipe describes finding duplicate files and performing operations based on the result.
Getting ready
We identify the duplicate files by comparing file content. Checksums are ideal for this task. Files with the same content will produce the same checksum values.
How to do it...
Follow these steps for finding or deleting duplicate files:
- Generate some test files:
$ echo "hello" > test ; cp test test_copy1 ; cp test test_copy2; $ echo "next" > other; # test_copy1 and test_copy2 are copy of test
- The code for the script to remove the duplicate files uses
awk
, an interpreter that's available on all Linux/Unix systems...