Comm Command on Linux with Examples
The comm
command in Linux is a powerful yet underappreciated utility that specializes in comparing two sorted files line by line. It’s designed to output three types of lines: those unique to the first file, those unique to the second file, and those common to both. This functionality makes comm
an invaluable tool for various tasks, such as identifying unique or shared entries in datasets, simplifying data analysis, and aiding in file management.
Prerequisites
Before leveraging the comm
command, ensure that:
- The files to be compared are sorted. If not, use the
sort
command to sort them beforehand. - The sorting order (collating sequence) is consistent across both files.
Syntax and Options
The basic syntax of the comm
command is:
comm [OPTION]... FILE1 FILE2
The standard output consists of three columns:
- Lines unique to FILE1
- Lines unique to FILE2
- Lines common to both files
Options to customize the output include
The comm
command comes with several options that allow you to customize its behavior:
-1
: Suppresses the first column (lines unique to the first file).-2
: Suppresses the second column (lines unique to the second file).-3
: Suppresses the third column (lines common to both files).--check-order
: Checks that the input is correctly sorted.--nocheck-order
: Does not check whether the inputs are sorted or not.--output-delimiter=STR
: Separate columns with string STR.--version
: Output version information.
For example, if you want to suppress the first column (lines unique to the first file), you can use the -1
option:
comm -1 file1.txt file2.txt
This will output two columns. The first column contains lines unique to file2.txt
, and the second column contains lines common to both files.
Basic Usage
To compare two sorted files without any options:
comm file1.txt file2.txt
This command will display three columns as described above.
Advanced Usage and Examples
Suppressing Columns
To view only the common lines between two files, use:
comm -12 file1.txt file2.txt
This suppresses the first two columns, showing only the third.
Comparing Unsorted Files
For unsorted files, combine comm
with sort
using process substitution:
comm -12 <(sort file1.txt) <(sort file2.txt)
Custom Output Delimiter
To improve readability, specify a custom delimiter:
comm --output-delimiter=" | " file1.txt file2.txt
Troubleshooting Tips
- If
comm
reports that files are not sorted, recheck the sorting order. - Ensure file encoding does not affect sorting.
- Use
LC_COLLATE=C sort file.txt
for consistent sorting across different environments. - For more detailed information about the
comm
command, you can explore the man page:
man comm
Conclusion
The comm
command is a testament to the versatility and power of Linux command-line tools. Its ability to efficiently compare sorted files makes it an essential utility for data analysis, system administration, and beyond. By mastering comm
, users can streamline their workflows, uncover insights from datasets, and perform complex file comparisons with ease. This article provides a concise yet comprehensive overview of the comm
command, tailored to be SEO-friendly and accessible to a wide audience. For a deeper dive, readers are encouraged to explore the comm
man page and experiment with the command to fully grasp its capabilities and applications.