RHEL BasedRocky Linux

How To Install Apache Kafka on Rocky Linux 10

Install Apache Kafka on Rocky Linux 10

Apache Kafka has become the backbone of modern data architectures, powering real-time data pipelines and streaming applications for thousands of companies worldwide. This distributed streaming platform excels at handling high-throughput, fault-tolerant publish-subscribe messaging systems that process millions of events per second. Rocky Linux 10, with its enterprise-grade stability and RHEL compatibility, provides an ideal foundation for hosting Kafka clusters in production environments. This comprehensive guide walks you through every step of installing and configuring Apache Kafka on Rocky Linux 10, from initial system preparation to verification testing. By the end of this tutorial, you’ll have a fully operational Kafka installation ready to handle your streaming data workloads.

Table of Contents

Prerequisites

Before diving into the installation process, ensure your system meets the necessary requirements for running Apache Kafka smoothly.

System Requirements

Your Rocky Linux 10 server needs adequate resources to support Kafka operations effectively. A minimum of 4GB RAM is required, though 8GB or more is strongly recommended for production deployments. Allocate at least 20GB of free disk space to accommodate Kafka binaries, log files, and message data. You’ll need root or sudo privileges to install packages and configure system services. A stable internet connection is essential for downloading Apache Kafka and its dependencies.

Technical Requirements

Apache Kafka runs on the Java Virtual Machine, requiring Java 11 or higher. For newer Kafka versions (3.0+), Java 17 or later is recommended for optimal performance and security updates. Basic familiarity with Linux command-line operations will help you navigate the installation process smoothly. Secure SSH access to your server enables remote administration and configuration tasks.

Network Requirements

Kafka uses specific network ports for communication. Port 9092 serves as the default broker communication port, while port 9093 handles controller traffic in KRaft mode. If you’re using the traditional ZooKeeper architecture, port 2181 must also be accessible. Understanding firewall configuration basics ensures proper network connectivity for your Kafka cluster.

Modern Kafka deployments can choose between KRaft mode (ZooKeeper-free) and traditional ZooKeeper-based setups. KRaft mode represents the future of Kafka architecture, eliminating external dependencies and simplifying cluster management.

Step 1: Update Your Rocky Linux 10 System

Maintaining an up-to-date system forms the foundation of secure and stable software installations. Begin by refreshing your package repository metadata and upgrading existing packages.

Open your terminal and execute the following command to update the package index:

sudo dnf update -y

This command synchronizes your local package database with remote repositories, ensuring you have access to the latest software versions. Next, upgrade all installed packages to their current versions:

sudo dnf upgrade -y

Install essential utilities that you’ll need throughout the Kafka setup process:

sudo dnf install wget tar curl git unzip -y

These tools facilitate file downloads, archive extraction, and version control operations. If your system update included kernel modifications, reboot your server to apply the changes:

sudo reboot

After rebooting, reconnect to your server via SSH. Your Rocky Linux 10 system is now current and ready for Kafka installation.

Step 2: Install Java Development Kit (JDK)

Why Java is Required

Apache Kafka’s entire architecture is built on the Java Virtual Machine, making Java an absolute prerequisite. Kafka requires a minimum of Java 11, though Java 17 or later delivers improved performance and security features for Kafka 3.0 and above versions.

Installing OpenJDK 11

Rocky Linux 10 provides OpenJDK packages through its default repositories. Install Java 11 with the following command:

sudo dnf install java-11-openjdk java-11-openjdk-devel -y

The java-11-openjdk package contains the Java Runtime Environment, while java-11-openjdk-devel includes development tools and libraries necessary for running Kafka services.

Alternative: Installing Java 17

For enhanced performance and access to the latest JVM optimizations, consider installing Java 17:

sudo dnf install java-17-openjdk java-17-openjdk-devel -y

Java 17 offers better garbage collection algorithms and reduced memory footprint, particularly beneficial for high-throughput Kafka deployments.

Verifying Java Installation

Confirm that Java installed correctly by checking its version:

java -version

You should see output displaying the OpenJDK version number, confirming successful installation. Check the Java compiler version as well:

javac -version

Setting JAVA_HOME

Some Kafka scripts require the JAVA_HOME environment variable. Set it system-wide by editing the environment file:

echo "JAVA_HOME=$(dirname $(dirname $(readlink -f $(which java))))" | sudo tee -a /etc/environment
source /etc/environment

Verify the JAVA_HOME setting:

echo $JAVA_HOME

Step 3: Create Dedicated Kafka User

Security Best Practices

Running services as dedicated system users follows the principle of least privilege, a fundamental security concept. This approach limits potential damage if a service becomes compromised. Kafka should never run as the root user in production environments.

Creating the Kafka System User

Create a dedicated system user for running Kafka services:

sudo useradd -r -d /opt/kafka -s /usr/sbin/nologin kafka

Let’s break down this command. The -r flag creates a system account without expiration. The -d /opt/kafka parameter sets the home directory. The -s /usr/sbin/nologin option prevents interactive shell login, enhancing security.

This configuration ensures Kafka runs in a controlled environment with restricted privileges. The kafka user can execute Kafka services but cannot log in directly to the system.

Step 4: Download Apache Kafka

Navigating to Installation Directory

Change to the /opt directory, the standard location for third-party software on Linux systems:

cd /opt

This directory provides a clean separation between system-managed software and manually installed applications.

Downloading Latest Kafka Release

Download the latest Apache Kafka binary distribution from the official mirror network:

sudo wget https://downloads.apache.org/kafka/3.8.0/kafka_2.13-3.8.0.tgz

The filename follows the convention kafka_[scala-version]-[kafka-version].tgz. The “2.13” refers to the Scala version used to build Kafka, while “3.8.0” is the Kafka version.

Alternatively, use curl if you prefer:

sudo curl -O https://downloads.apache.org/kafka/3.8.0/kafka_2.13-3.8.0.tgz

Verifying Download

Check that the file downloaded completely by examining its size:

ls -lh kafka_2.13-3.8.0.tgz

For production environments, verify the download’s integrity using checksums available on the Apache Kafka website.

Understanding Binary vs Source

Binary distributions come pre-compiled and ready to run, making them ideal for most installations. Source distributions require compilation and are typically used only for custom builds or when contributing to Kafka development.

Step 5: Extract and Configure Kafka Installation

Extracting the Archive

Extract the downloaded Kafka archive using tar:

sudo tar -xzf kafka_2.13-3.8.0.tgz

The tar command uses three flags: -x extracts files, -z handles gzip compression, and -f specifies the filename.

Renaming and Moving Directory

Rename the extracted directory to a simpler path:

sudo mv kafka_2.13-3.8.0 kafka

This creates a version-agnostic path at /opt/kafka, simplifying configuration and future upgrades.

Setting Proper Ownership

Transfer ownership of the entire Kafka directory to the kafka user:

sudo chown -R kafka:kafka /opt/kafka

The -R flag applies ownership recursively to all subdirectories and files. Verify the ownership change:

ls -la /opt/ | grep kafka

Creating Log Directory

Establish a dedicated directory for Kafka data logs:

sudo -u kafka mkdir -p /opt/kafka/logs

This directory stores message data, partition logs, and metadata. Proper log directory configuration is crucial for Kafka’s performance and data persistence.

Step 6: Configure Kafka Server Properties

Understanding KRaft vs ZooKeeper Mode

Apache Kafka traditionally relied on ZooKeeper for cluster coordination and metadata management. KRaft (Kafka Raft) mode, introduced in recent versions, eliminates this dependency, simplifying architecture and improving performance. For new installations on Rocky Linux 10, KRaft mode is recommended as it represents Kafka’s future direction.

Editing server.properties for KRaft Mode

Open the server configuration file in your preferred text editor:

sudo nano /opt/kafka/config/server.properties

Configure the following essential parameters for KRaft mode. Set the process roles to enable both broker and controller functions:

process.roles=broker,controller

Assign a unique node identifier:

node.id=1

Configure the controller quorum voters:

controller.quorum.voters=1@localhost:9093

Set up network listeners for broker and controller communication:

listeners=PLAINTEXT://0.0.0.0:9092,CONTROLLER://0.0.0.0:9093
advertised.listeners=PLAINTEXT://localhost:9092

Define the listener security protocol mapping:

listener.security.protocol.map=CONTROLLER:PLAINTEXT,PLAINTEXT:PLAINTEXT

Specify the inter-broker listener name:

inter.broker.listener.name=PLAINTEXT

Set the log directory path:

log.dirs=/opt/kafka/logs

Alternative: ZooKeeper Mode Configuration

If you prefer traditional ZooKeeper-based Kafka, configure these parameters instead:

broker.id=1
listeners=PLAINTEXT://0.0.0.0:9092
advertised.listeners=PLAINTEXT://localhost:9092
log.dirs=/opt/kafka/logs
zookeeper.connect=localhost:2181

Advanced Configuration Options

Optimize performance by adjusting network and I/O threads:

num.network.threads=3
num.io.threads=8

Configure data retention policies based on your requirements:

log.retention.hours=168
log.segment.bytes=1073741824

The retention period determines how long Kafka retains messages before deletion. Segment size affects disk I/O patterns and cleanup efficiency.

Formatting Log Directory for KRaft

KRaft mode requires one-time log directory formatting before first startup:

KAFKA_CLUSTER_ID="$(sudo -u kafka /opt/kafka/bin/kafka-storage.sh random-uuid)"

Format the storage using the generated cluster ID:

sudo -u kafka /opt/kafka/bin/kafka-storage.sh format -t $KAFKA_CLUSTER_ID -c /opt/kafka/config/server.properties

This operation initializes the metadata log and prepares the cluster for operation. Never format a running cluster, as this destroys all existing data.

Step 7: Create Systemd Service Files

Why Systemd Services are Important

Systemd integration provides professional service management capabilities. Services configured through systemd start automatically during system boot. Systemd monitors service health and can automatically restart failed processes. Centralized logging through journald simplifies troubleshooting and monitoring.

Creating ZooKeeper Service (if using ZooKeeper mode)

For ZooKeeper-based deployments, create a ZooKeeper service file:

sudo nano /etc/systemd/system/zookeeper.service

Add the following configuration:

[Unit]
Description=Apache Zookeeper Server
Documentation=http://zookeeper.apache.org
Requires=network.target
After=network.target

[Service]
Type=simple
User=kafka
Group=kafka
ExecStart=/opt/kafka/bin/zookeeper-server-start.sh /opt/kafka/config/zookeeper.properties
ExecStop=/opt/kafka/bin/zookeeper-server-stop.sh
Restart=on-abnormal

[Install]
WantedBy=multi-user.target

This service file ensures ZooKeeper starts before Kafka and restarts automatically on failure.

Creating Kafka Service

Create the Kafka systemd service file:

sudo nano /etc/systemd/system/kafka.service

For KRaft mode, use this configuration:

[Unit]
Description=Apache Kafka Server
Documentation=http://kafka.apache.org/documentation.html
Requires=network.target
After=network.target

[Service]
Type=simple
User=kafka
Group=kafka
Environment="JAVA_HOME=/usr/lib/jvm/jre-11-openjdk"
ExecStart=/opt/kafka/bin/kafka-server-start.sh /opt/kafka/config/server.properties
ExecStop=/opt/kafka/bin/kafka-server-stop.sh
Restart=on-abnormal

[Install]
WantedBy=multi-user.target

For ZooKeeper mode, add these directives under the [Unit] section:

Requires=zookeeper.service
After=zookeeper.service

Reloading Systemd Daemon

Inform systemd about the new service files:

sudo systemctl daemon-reload

This command parses the new service definitions and makes them available for management. Verify that systemd recognizes the services:

sudo systemctl list-unit-files | grep -E 'kafka|zookeeper'

Step 8: Start and Enable Kafka Services

Starting Services

Launch the Kafka service:

For KRaft mode:

sudo systemctl start kafka

For ZooKeeper mode, start ZooKeeper first, then Kafka:

sudo systemctl start zookeeper
sudo systemctl start kafka

The startup process typically takes 10-30 seconds depending on your system resources.

Enabling Services for Auto-Start

Configure services to start automatically during system boot:

sudo systemctl enable kafka

If using ZooKeeper:

sudo systemctl enable zookeeper
sudo systemctl enable kafka

Enabling services ensures your Kafka cluster remains available after system reboots or power cycles.

Checking Service Status

Verify that Kafka is running properly:

sudo systemctl status kafka

Look for “active (running)” in the output. The status display shows the service state, process ID, and recent log entries. If you see “failed” or “inactive,” investigate the logs for error messages.

For ZooKeeper mode, also check:

sudo systemctl status zookeeper

Viewing Service Logs

Monitor Kafka service logs in real-time:

sudo journalctl -u kafka -f

Press Ctrl+C to stop following logs. View the last 100 lines of Kafka logs:

sudo journalctl -u kafka -n 100

Successful startup logs typically include messages about socket server initialization and completion of startup sequence.

Step 9: Configure Firewall for Kafka

Understanding Required Ports

Kafka requires specific network ports for client and inter-broker communication. Port 9092 handles all Kafka client connections and data transfer. Port 9093 serves controller traffic in KRaft mode. ZooKeeper-based setups need port 2181 open for client connections.

Checking Firewalld Status

Verify that firewalld is active on your system:

sudo systemctl status firewalld

If firewalld is inactive, start and enable it:

sudo systemctl start firewalld
sudo systemctl enable firewalld

Adding Firewall Rules

Open the necessary ports for Kafka operation:

sudo firewall-cmd --permanent --add-port=9092/tcp

For KRaft mode, also open the controller port:

sudo firewall-cmd --permanent --add-port=9093/tcp

If using ZooKeeper:

sudo firewall-cmd --permanent --add-port=2181/tcp

The --permanent flag ensures rules persist across firewall restarts.

Reloading Firewall Configuration

Apply the new firewall rules:

sudo firewall-cmd --reload

Verify that ports are now open:

sudo firewall-cmd --list-ports

You should see ports 9092, and potentially 9093 or 2181, listed in the output.

Security Considerations

For production environments, restrict access to specific IP addresses or networks. Create rich rules for granular control:

sudo firewall-cmd --permanent --add-rich-rule='rule family="ipv4" source address="192.168.1.0/24" port protocol="tcp" port="9092" accept'

This command allows only clients from the 192.168.1.0/24 subnet to connect to Kafka. Always follow the principle of least privilege when configuring network access.

Step 10: Verify Kafka Installation

Testing Kafka Broker Connectivity

Confirm that Kafka is listening and responding to requests:

sudo /opt/kafka/bin/kafka-broker-api-versions.sh --bootstrap-server localhost:9092

This command queries the broker for supported API versions. Successful output displays a detailed list of API endpoints, confirming that Kafka is operational.

Creating a Test Topic

Topics organize messages within Kafka. Create a test topic to verify cluster functionality:

sudo -u kafka /opt/kafka/bin/kafka-topics.sh --create --topic test-topic --bootstrap-server localhost:9092 --partitions 1 --replication-factor 1

Partitions enable parallel processing, while replication factor determines data redundancy. For single-node deployments, use a replication factor of 1.

Listing Topics

Display all topics in your Kafka cluster:

sudo -u kafka /opt/kafka/bin/kafka-topics.sh --list --bootstrap-server localhost:9092

The output should include “test-topic” among any other topics present.

Describing Topic Details

Examine the configuration and status of your test topic:

sudo -u kafka /opt/kafka/bin/kafka-topics.sh --describe --topic test-topic --bootstrap-server localhost:9092

The output displays partition information, leader assignments, replicas, and in-sync replicas (ISR). Understanding this output helps you monitor cluster health and data distribution.

Checking Kafka Logs

Navigate to the Kafka log directory:

ls -lh /opt/kafka/logs/

Review the main server log file for startup messages:

tail -50 /opt/kafka/logs/server.log

Successful initialization includes messages about socket server starting, log loading completion, and the broker being in RUNNING state.

Step 11: Test Kafka with Producer and Consumer

Understanding Producer-Consumer Model

Kafka’s architecture separates message producers from consumers through topics. Producers write messages to topics without knowing who will consume them. Consumers read messages from topics independently, enabling scalable, decoupled architectures.

Starting Kafka Console Producer

Open a terminal session and launch the console producer:

sudo -u kafka /opt/kafka/bin/kafka-console-producer.sh --topic test-topic --bootstrap-server localhost:9092

The prompt changes to >, indicating the producer is ready to accept input. Type messages and press Enter after each:

>Hello Kafka on Rocky Linux 10
>This is a test message
>Streaming data works perfectly

Each line becomes a separate message in the topic. Leave this terminal open.

Starting Kafka Console Consumer

Open a second terminal session to your server. Start the console consumer:

sudo -u kafka /opt/kafka/bin/kafka-console-consumer.sh --topic test-topic --from-beginning --bootstrap-server localhost:9092

The --from-beginning flag tells the consumer to read all existing messages from the topic’s start. Messages you typed in the producer terminal appear in the consumer output.

Verifying Message Flow

With both terminals visible, type new messages in the producer terminal. These messages appear almost instantly in the consumer terminal, demonstrating real-time streaming capabilities. Message ordering within a partition is guaranteed, ensuring predictable data flow.

Testing with Key-Value Messages

Stop the current producer with Ctrl+C. Restart it with key parsing enabled:

sudo -u kafka /opt/kafka/bin/kafka-console-producer.sh --topic test-topic --bootstrap-server localhost:9092 --property parse.key=true --property key.separator=:

Send messages with keys:

user1:Login event
user2:Purchase completed
user1:Logout event

Keys affect partition assignment and enable consumer groups to process related messages together.

Stopping Producer and Consumer

Gracefully stop both processes using Ctrl+C in each terminal. This closes connections and releases resources properly.

Step 12: Basic Kafka Management Operations

Managing Topics

Modify existing topic configurations using the alter command:

sudo -u kafka /opt/kafka/bin/kafka-topics.sh --alter --topic test-topic --partitions 3 --bootstrap-server localhost:9092

Note that you can only increase partition count, never decrease it. Reducing partitions risks data loss and breaks consumer offsets.

Change retention policies for specific topics:

sudo -u kafka /opt/kafka/bin/kafka-configs.sh --alter --entity-type topics --entity-name test-topic --add-config retention.ms=604800000 --bootstrap-server localhost:9092

This sets retention to 7 days (604800000 milliseconds).

Monitoring Kafka Performance

Consumer groups track reading progress across topic partitions. Check consumer group status:

sudo -u kafka /opt/kafka/bin/kafka-consumer-groups.sh --describe --group console-consumer-group --bootstrap-server localhost:9092

The output shows current offset, log end offset, and lag for each partition. High lag indicates consumers cannot keep pace with producers, suggesting capacity issues.

Monitor topic message counts:

sudo -u kafka /opt/kafka/bin/kafka-run-class.sh kafka.tools.GetOffsetShell --broker-list localhost:9092 --topic test-topic

Deleting Topics

Remove topics that are no longer needed:

sudo -u kafka /opt/kafka/bin/kafka-topics.sh --delete --topic test-topic --bootstrap-server localhost:9092

Ensure delete.topic.enable=true is set in server.properties for deletion to work. Deleted topics cannot be recovered unless you have external backups.

Kafka Connect Introduction

Kafka Connect provides a framework for streaming data between Kafka and external systems. Configuration files reside in /opt/kafka/config/. Common use cases include database change data capture, log aggregation, and cloud storage integration.

Performance Tuning Tips

Adjust JVM heap size for better memory management. Edit the Kafka startup script or set KAFKA_HEAP_OPTS environment variable:

export KAFKA_HEAP_OPTS="-Xmx4G -Xms4G"

Optimize batch sizes in producer configurations for higher throughput. Tune network buffer sizes to match your network capacity and message patterns.

Common Troubleshooting Issues

Service Fails to Start

If Kafka won’t start, first verify Java installation:

java -version
echo $JAVA_HOME

Check file permissions on the Kafka directory:

ls -la /opt/kafka

All files should be owned by the kafka user and group. Review systemd service logs for detailed error messages:

sudo journalctl -xe -u kafka

Ensure ports 9092 and 9093 aren’t already in use:

sudo netstat -tuln | grep -E '9092|9093'

Connection Refused Errors

Verify that Kafka is listening on the correct interface:

sudo netstat -tuln | grep 9092

You should see Kafka listening on 0.0.0.0:9092 or your specific IP address. Check firewall rules:

sudo firewall-cmd --list-all

Verify the listeners configuration in server.properties matches your network setup. The advertised.listeners property must be accessible from client machines.

ZooKeeper Connection Issues

For ZooKeeper-based deployments, ensure ZooKeeper starts before Kafka:

sudo systemctl status zookeeper

Verify the zookeeper.connect property in server.properties matches your ZooKeeper configuration. Test ZooKeeper connectivity:

telnet localhost 2181

Type ruok (are you ok) and press Enter. A healthy ZooKeeper responds with imok.

Disk Space Problems

Monitor disk usage regularly:

df -h /opt/kafka/logs

Adjust log retention settings to match available storage:

log.retention.hours=24
log.retention.bytes=1073741824

Implement automatic cleanup policies by ensuring log.cleanup.policy=delete in server.properties.

Memory Issues

Monitor Java heap usage and system memory:

free -m
top -p $(pgrep -f kafka)

If Kafka consumes excessive memory, reduce heap size in KAFKA_HEAP_OPTS or kafka-server-start.sh. Consider adding more RAM for production workloads.

Security Best Practices

Authentication Configuration

Implement SASL (Simple Authentication and Security Layer) for client authentication. Configure SSL/TLS to encrypt data in transit between clients and brokers. Set up ACLs (Access Control Lists) to restrict topic access:

sudo -u kafka /opt/kafka/bin/kafka-acls.sh --add --allow-principal User:client1 --operation Read --topic test-topic --bootstrap-server localhost:9092

Network Security

Limit firewall access to trusted IP addresses:

sudo firewall-cmd --permanent --add-rich-rule='rule family="ipv4" source address="10.0.0.0/8" accept'

Use private networks for inter-broker communication in multi-node clusters. Implement network segmentation to isolate Kafka traffic from public networks.

User and Permission Management

Running Kafka as a dedicated non-root user (already implemented) is essential. Set restrictive file permissions:

sudo chmod 750 /opt/kafka
sudo chmod 640 /opt/kafka/config/*.properties

Conduct regular security audits to identify potential vulnerabilities.

Data Security

Consider encryption at rest for sensitive data. Enable SSL for inter-broker communication in multi-broker clusters:

security.inter.broker.protocol=SSL

Store sensitive configuration parameters separately from application code.

Regular Updates

Keep your Kafka installation current with security patches. Monitor Apache Kafka security advisories and the Rocky Linux security list. Plan upgrade strategies that minimize downtime and data loss risk.

Congratulations! You have successfully installed Apache Kafka. Thanks for using this tutorial for installing Apache Kafka distributed streaming platform on your Rocky Linux 10 system. For additional help or useful information, we recommend you check the official Apache website.

VPS Manage Service Offer
If you don’t have time to do all of this stuff, or if this is not your area of expertise, we offer a service to do “VPS Manage Service Offer”, starting from $10 (Paypal payment). Please contact us to get the best deal!

r00t

r00t is an experienced Linux enthusiast and technical writer with a passion for open-source software. With years of hands-on experience in various Linux distributions, r00t has developed a deep understanding of the Linux ecosystem and its powerful tools. He holds certifications in SCE and has contributed to several open-source projects. r00t is dedicated to sharing her knowledge and expertise through well-researched and informative articles, helping others navigate the world of Linux with confidence.
Back to top button