comm command in Linux is a powerful yet underappreciated utility that specializes in comparing two sorted files line by line. It’s designed to output three types of lines: those unique to the first file, those unique to the second file, and those common to both. This functionality makes
comm an invaluable tool for various tasks, such as identifying unique or shared entries in datasets, simplifying data analysis, and aiding in file management.
Before leveraging the
comm command, ensure that:
- The files to be compared are sorted. If not, use the
sortcommand to sort them beforehand.
- The sorting order (collating sequence) is consistent across both files.
Syntax and Options
The basic syntax of the
comm command is:
comm [OPTION]... FILE1 FILE2
The standard output consists of three columns:
- Lines unique to FILE1
- Lines unique to FILE2
- Lines common to both files
Options to customize the output include
comm command comes with several options that allow you to customize its behavior:
-1: Suppresses the first column (lines unique to the first file).
-2: Suppresses the second column (lines unique to the second file).
-3: Suppresses the third column (lines common to both files).
--check-order: Checks that the input is correctly sorted.
--nocheck-order: Does not check whether the inputs are sorted or not.
--output-delimiter=STR: Separate columns with string STR.
--version: Output version information.
For example, if you want to suppress the first column (lines unique to the first file), you can use the
comm -1 file1.txt file2.txt
This will output two columns. The first column contains lines unique to
file2.txt, and the second column contains lines common to both files.
To compare two sorted files without any options:
comm file1.txt file2.txt
This command will display three columns as described above.
Advanced Usage and Examples
To view only the common lines between two files, use:
comm -12 file1.txt file2.txt
This suppresses the first two columns, showing only the third.
Comparing Unsorted Files
For unsorted files, combine
sort using process substitution:
comm -12 <(sort file1.txt) <(sort file2.txt)
Custom Output Delimiter
To improve readability, specify a custom delimiter:
comm --output-delimiter=" | " file1.txt file2.txt
commreports that files are not sorted, recheck the sorting order.
- Ensure file encoding does not affect sorting.
LC_COLLATE=C sort file.txtfor consistent sorting across different environments.
- For more detailed information about the
commcommand, you can explore the man page:
comm command is a testament to the versatility and power of Linux command-line tools. Its ability to efficiently compare sorted files makes it an essential utility for data analysis, system administration, and beyond. By mastering
comm, users can streamline their workflows, uncover insights from datasets, and perform complex file comparisons with ease. This article provides a concise yet comprehensive overview of the
comm command, tailored to be SEO-friendly and accessible to a wide audience. For a deeper dive, readers are encouraged to explore the
comm man page and experiment with the command to fully grasp its capabilities and applications.