Working with Duplicate Records/Lines in Text: A Quick Guide to the uniq
Command
The uniq
command is a powerful tool used for sorting lines of text and working with duplicate records. Whether you need to extract duplicate lines from a file or process the output of another command using pipes, uniq
has got you covered.
To get started with uniq
, keep in mind that it detects adjacent duplicate lines by default. This means that combining it with the sort
command will yield the best results:
1 | sort dogs.txt | uniq |
While the sort
command has its own option -u
to remove duplicates, uniq
provides additional capabilities. By default, uniq
removes duplicate lines from the input. However, you can use the -d
option to only display the duplicate lines:
1 | sort dogs.txt | uniq -d |
If you’re interested in displaying only the non-duplicate lines, use the -u
option:
1 | sort dogs.txt | uniq -u |
For a count of occurrences for each line, the -c
option comes in handy:
1 | sort dogs.txt | uniq -c |
To add more complexity, you can combine multiple commands to sort the lines by the most frequent occurrence:
1 | sort dogs.txt | uniq -c | sort -nr |
The beauty of the uniq
command is that it works not only on Linux, but also on macOS, WSL (Windows Subsystem for Linux), and any UNIX environment available to you.
Now that you’re equipped with this quick guide, you can effectively handle duplicate records and lines in your text using the uniq
command.
tags: [“Linux commands”, “uniq”, “sorting”, “duplicate lines”, “text processing”]