/

Working with Duplicate Records/Lines in Text: A Quick Guide to the `uniq` Command

Working with Duplicate Records/Lines in Text: A Quick Guide to the uniq Command

The uniq command is a powerful tool used for sorting lines of text and working with duplicate records. Whether you need to extract duplicate lines from a file or process the output of another command using pipes, uniq has got you covered.

To get started with uniq, keep in mind that it detects adjacent duplicate lines by default. This means that combining it with the sort command will yield the best results:

1
sort dogs.txt | uniq

While the sort command has its own option -u to remove duplicates, uniq provides additional capabilities. By default, uniq removes duplicate lines from the input. However, you can use the -d option to only display the duplicate lines:

1
sort dogs.txt | uniq -d

If you’re interested in displaying only the non-duplicate lines, use the -u option:

1
sort dogs.txt | uniq -u

For a count of occurrences for each line, the -c option comes in handy:

1
sort dogs.txt | uniq -c

To add more complexity, you can combine multiple commands to sort the lines by the most frequent occurrence:

1
sort dogs.txt | uniq -c | sort -nr

The beauty of the uniq command is that it works not only on Linux, but also on macOS, WSL (Windows Subsystem for Linux), and any UNIX environment available to you.

Now that you’re equipped with this quick guide, you can effectively handle duplicate records and lines in your text using the uniq command.

tags: [“Linux commands”, “uniq”, “sorting”, “duplicate lines”, “text processing”]