Commands

Comm Command on Linux with Examples

Comm Command on Linux

Comparing files is a fundamental task for system administrators, developers, and Linux enthusiasts alike. While there are several comparison tools available in Linux, the comm command stands out for its simplicity and effectiveness when working with sorted text files. This powerful utility provides a clear, columnar view of the differences and similarities between files, making it an invaluable tool for data analysis, configuration management, and scripting tasks.

In this comprehensive guide, we’ll explore the Linux comm command in depth, covering everything from basic usage to advanced techniques with practical examples. Whether you’re a beginner or an experienced Linux user, you’ll discover how to leverage this versatile command to streamline your file comparison workflows.

Understanding the Comm Command

The comm command, short for “compare,” is a command-line utility in Linux that compares two sorted files line by line. Unlike other comparison tools that focus on differences, comm provides a three-column output that gives you a complete picture of the relationship between files.

What makes comm unique:

  • First column shows lines unique to the first file
  • Second column displays lines unique to the second file
  • Third column presents lines common to both files

This three-column approach makes the comm command particularly useful for set operations on text files, allowing you to easily identify unique and shared content between datasets.

The comm command has been part of Unix systems since Version 4 Unix in the 1970s, written originally by Lee E. McMahon, and later incorporated into GNU coreutils by Richard Stallman and David MacKenzie. Its longevity speaks to its continued utility in modern Linux environments.

Installation and Verification

Before diving into examples, let’s ensure the comm command is available on your system. As part of the essential coreutils package, comm comes pre-installed on virtually all Linux distributions.

To verify comm is installed:

comm --version

Or alternatively:

which comm

If for some reason comm is not available, you can install or reinstall the coreutils package using your distribution’s package manager:

For Debian/Ubuntu systems:

sudo apt install --reinstall coreutils

For CentOS/Fedora systems:

sudo yum install --reinstall coreutils

If you encounter a “command not found” error after installation, it might indicate an issue with your system PATH. You can troubleshoot by checking if the binary is in a standard location:

echo $PATH

And if necessary, add the binary location to your PATH:

export PATH=$PATH:/usr/local/bin

Basic Syntax and Structure

The fundamental syntax of the comm command is straightforward:

comm [options] file1 file2

Both file1 and file2 should be sorted according to the current locale’s collating sequence for the command to work correctly. If you specify a hyphen (-) for one of the file names, comm will read from standard input instead.

The default output of comm displays three columns, separated by tab characters:

  1. Lines unique to file1
  2. Lines unique to file2
  3. Lines common to both files

This format makes it easy to distinguish between file contents at a glance, but as we’ll see, various options allow you to customize this output to suit your specific needs.

Command Options in Detail

The comm command supports several options that modify its behavior and output format. Here’s a comprehensive list of the available options:

Option Description
-1 Suppresses the first column (lines unique to file1)
-2 Suppresses the second column (lines unique to file2)
-3 Suppresses the third column (lines common to both files)
–check-order Checks that input is correctly sorted, even if all lines are pairable
–nocheck-order Skips the sorting check on input files
–output-delimiter=STR Separates columns with the specified string instead of tabs
–total Outputs the total number of lines in each column
-z Displays output lines as NULL-terminated instead of newline-terminated
–help Displays a help message and exits
–version Outputs version information and exits

These options can be combined to create powerful custom comparisons. For example, combining -1 and -2 would show only the common lines between files, effectively performing an intersection operation.

Basic Comparison Examples

To illustrate how comm works, let’s create two simple text files and compare them. First, we’ll create our test files:

file1.txt:

001
056
127
258

file2.txt:

002
056
167
369

Now, let’s compare these files using the basic comm command:

comm file1.txt file2.txt

The output will show:

001
	002
		056
127
	167
258
	369

Here, the first column shows lines unique to file1.txt (001, 127, 258), the second column (indented with a tab) shows lines unique to file2.txt (002, 167, 369), and the third column (indented with two tabs) shows the common line (056).

This basic comparison is useful for quickly identifying differences and similarities between files, but the real power of comm comes when we start manipulating these columns for specific purposes.

Column Manipulation Examples

The column suppression options (-1, -2, -3) allow you to focus on specific aspects of the comparison. Here are some practical examples:

To show only lines unique to file1 (suppress columns 2 and 3):

comm -23 file1.txt file2.txt

Output:

001
127
258

To show only lines unique to file2 (suppress columns 1 and 3):

comm -13 file1.txt file2.txt

Output:

002
167
369

To show only lines common to both files (suppress columns 1 and 2):

comm -12 file1.txt file2.txt

Output:

056

These column manipulations effectively perform set operations on text files:

  • comm -23 gives you the set difference (file1 – file2)
  • comm -13 gives you the set difference (file2 – file1)
  • comm -12 gives you the set intersection (file1 ∩ file2)
  • comm -3 gives you the symmetric difference (file1 ⊕ file2)

Using these column manipulations, you can quickly perform data analysis tasks like finding entries in one dataset but not another, or identifying common elements across datasets.

Working with Unsorted Files

One of the key requirements of the comm command is that input files must be sorted. If they aren’t, the command will report an error like “file1 is not in sorted order” and the output may be incorrect.

There are two approaches to handling unsorted files:

1. Pre-sort the files using the sort command:

comm <(sort file1.txt) <(sort file2.txt)

This bash process substitution technique creates sorted temporary versions of your files without altering the originals.

2. Use the –nocheck-order option:

comm --nocheck-order file1.txt file2.txt

This option tells comm to skip the sorting check, but be aware that the results may be incorrect if the files aren’t actually sorted.

For reliable results, pre-sorting your files is generally the recommended approach. However, if you’re certain about your data organization, the –nocheck-order option can save processing time.

Customizing Output Format

By default, the comm command separates columns with tab characters, which can sometimes make the output difficult to read, especially when redirecting to other commands. The –output-delimiter option allows you to specify a different separator:

comm --output-delimiter="| " file1.txt file2.txt

This would produce output with columns separated by “| ” instead of tabs, making it more readable:

001| 002| 056
127| 167| 
258| 369| 

When working with the output programmatically, you might also find the -z option useful, which terminates lines with NULL characters instead of newlines:

comm -z file1.txt file2.txt

This can be particularly helpful when dealing with filenames or other data that might contain newlines.

Advanced Usage Scenarios

The comm command becomes even more powerful when combined with other Linux utilities. Here are some advanced usage scenarios:

Comparing directory contents:

comm <(ls directory1 | sort) <(ls directory2 | sort)

This command compares the file names in two directories, showing which files are unique to each directory and which are common to both.

Working with standard input:

cat file1.txt | comm - file2.txt

This example compares the contents of file1.txt (fed through standard input) with file2.txt. Using the hyphen (-) tells comm to read from standard input instead of a file.

Performing complex set operations:
Let’s say we have two files containing lists of plants and foods, and we want to find items that are in either list but not in both (symmetric difference):

comm -3 <(sort plants.txt) <(sort foods.txt)

Or to find items that are in both lists but not in their common intersection:

diff <(comm -23 <(comm <(sort plants.txt) <(sort foods.txt)) <(comm -12 <(sort plants.txt) <(sort foods.txt))) <(comm -3 <(sort plants.txt) <(sort foods.txt))

These examples demonstrate how comm can be leveraged for complex data operations when combined with other commands.

Practical Use Cases

The comm command has numerous practical applications in real-world Linux environments:

Configuration file management:

comm -3 <(sort original_config.txt) <(sort new_config.txt)

This helps identify configuration changes between versions, showing lines that have been added or removed without displaying unchanged settings.

Data validation and cleansing:

comm -23 <(sort master_list.txt) <(sort exceptions.txt) > clean_list.txt

This removes all entries in an exceptions list from the master list, creating a clean dataset.

Log file analysis:

comm -12 <(grep ERROR log1.txt | sort) <(grep ERROR log2.txt | sort)

This finds common error messages across multiple log files, helping identify persistent issues.

System administration:

comm -23 <(apt list --installed | sort) <(apt list --installed -a | sort)

This can help identify packages that might have multiple versions installed on a Debian-based system.

Troubleshooting Common Issues

When working with the comm command, you might encounter several common issues:

1. “Not in sorted order” errors

Solution: Pre-sort your files or use the –nocheck-order option:

comm <(sort file1.txt) <(sort file2.txt)

2. Empty or incorrect output

Check if your files have different line endings (Windows vs. Unix) which can cause comparison issues:

dos2unix file1.txt file2.txt

3. Performance issues with large files

For very large files, consider using temporary files with sort rather than process substitution:

sort file1.txt > file1_sorted.txt
sort file2.txt > file2_sorted.txt
comm file1_sorted.txt file2_sorted.txt

4. Character encoding problems

Ensure both files use the same character encoding to avoid comparison issues:

iconv -f ISO-8859-1 -t UTF-8 file1.txt > file1_utf8.txt

5. Issues with custom delimiters

When using –output-delimiter with empty strings or special characters, you might need to escape them properly:

comm --output-delimiter="\t|\t" file1.txt file2.txt

Being aware of these potential issues and their solutions will help you use the comm command more effectively in various scenarios.

Best Practices and Tips

To make the most of the comm command, consider these best practices:

1. Always pre-sort your files for reliable results:

comm <(sort -u file1.txt) <(sort -u file2.txt)

Adding -u to sort removes duplicates for cleaner comparisons.

2. Choose appropriate delimiters for readability:

comm --output-delimiter=" | " file1.txt file2.txt

Visual separators make output easier to interpret.

3. Use meaningful column combinations for specific tasks:

  • comm -12: Find common elements (intersection)
  • comm -3: Find differences only (symmetric difference)
  • comm -23: Find elements unique to first file (difference)

4. Combine with other tools for powerful workflows:

grep "^[A-Z]" file1.txt | sort | comm - <(sort file2.txt)

This filters file1 for lines starting with capital letters before comparison.

5. Consider preprocessing files when dealing with special cases:

tr '[:upper:]' '[:lower:]' < file1.txt | sort | comm - <(tr '[:upper:]' '[:lower:]' < file2.txt | sort)

This performs a case-insensitive comparison.

Following these practices will help you create more efficient and effective file comparison workflows using the comm command.

VPS Manage Service Offer
If you don’t have time to do all of this stuff, or if this is not your area of expertise, we offer a service to do “VPS Manage Service Offer”, starting from $10 (Paypal payment). Please contact us to get the best deal!

r00t

r00t is an experienced Linux enthusiast and technical writer with a passion for open-source software. With years of hands-on experience in various Linux distributions, r00t has developed a deep understanding of the Linux ecosystem and its powerful tools. He holds certifications in SCE and has contributed to several open-source projects. r00t is dedicated to sharing her knowledge and expertise through well-researched and informative articles, helping others navigate the world of Linux with confidence.
Back to top button