AlmaLinuxRHEL Based

How To Install Apache Kafka on AlmaLinux 10

Install Apache Kafka on AlmaLinux 10

Apache Kafka has revolutionized how organizations handle real-time data streaming and event-driven architectures. As a distributed streaming platform, it processes millions of events per second while maintaining exceptional reliability and fault tolerance. AlmaLinux 10, the latest RHEL-compatible enterprise Linux distribution, provides an ideal foundation for deploying Kafka with enhanced security features, extended hardware support, and long-term stability. This comprehensive guide walks you through every step of installing and configuring Apache Kafka on AlmaLinux 10, from initial system preparation to production-ready deployment.

Whether you’re building real-time data pipelines, implementing log aggregation systems, or developing event-driven microservices, mastering Kafka installation on AlmaLinux 10 positions you for success. System administrators, DevOps engineers, and developers will find actionable instructions, security best practices, and troubleshooting strategies to ensure a smooth deployment.

What is Apache Kafka?

Apache Kafka is an open-source distributed event streaming platform designed for high-throughput, low-latency data processing. Originally developed by LinkedIn and later donated to the Apache Software Foundation, Kafka has become the de facto standard for real-time data streaming across industries. It excels at handling massive volumes of data streams, processing them reliably, and making them available to multiple consumers simultaneously.

Organizations leverage Kafka for various mission-critical applications. Log aggregation systems collect and centralize logs from multiple services. Metrics collection platforms gather telemetry data from distributed systems. Stream processing applications perform real-time analytics on flowing data. Message queuing systems decouple microservices architectures. Companies like Netflix, Uber, LinkedIn, and Twitter rely on Kafka to power their data infrastructure, processing trillions of messages daily.

The platform’s key benefits include exceptional scalability through horizontal scaling, fault tolerance via data replication, durability through persistent storage, and impressive throughput measured in millions of messages per second. These capabilities make Kafka indispensable for modern data-driven organizations requiring real-time insights and reliable data delivery.

Understanding Apache Kafka Architecture

Kafka’s distributed architecture comprises several interconnected components working together seamlessly. Brokers serve as the core servers that store, manage, and serve data to clients. Each broker handles thousands of read and write operations per second, maintaining data partitions and coordinating with other brokers.

Topics and Partitions organize data streams logically. Topics represent categories or feed names where records are published. Each topic divides into multiple partitions, enabling parallel processing and scalability. Partitions distribute across broker nodes, providing redundancy and load distribution. This partitioning strategy allows Kafka to handle massive data volumes efficiently.

Producers are client applications that publish data streams to Kafka topics. They decide which partition receives each message, either through explicit specification or automatic distribution based on message keys. Producers can batch messages for efficiency and configure acknowledgment levels for reliability.

Consumers and Consumer Groups read data from Kafka topics. Individual consumers subscribe to topics and process messages sequentially. Consumer groups enable parallel processing where multiple consumers coordinate to divide partition consumption, ensuring each message is processed exactly once within the group.

Kafka historically relied on ZooKeeper for cluster coordination, metadata management, and leader election. However, Kafka 2.8 introduced KRaft mode (Kafka Raft), which eliminates ZooKeeper dependency by implementing consensus directly within Kafka. While this guide uses the traditional ZooKeeper approach for compatibility, KRaft represents Kafka’s future architecture.

Prerequisites for Installing Kafka on AlmaLinux 10

Before proceeding with installation, ensure your environment meets these requirements. You’ll need an AlmaLinux 10 server with minimum specifications of 2GB RAM, 2 CPU cores, and 20GB available storage. Production environments should provision significantly more resources based on expected throughput and retention requirements.

Administrative access through root or sudo privileges is essential for installing packages, creating system users, and configuring services. Network connectivity enables downloading Kafka binaries and dependencies from official repositories. Basic familiarity with Linux command-line operations, text editors like vim or nano, and firewall configuration will streamline the installation process.

Apache Kafka requires Java Development Kit (JDK) version 11 or later. Verify your Java installation by running java -version in the terminal. If Java isn’t installed, you’ll add it during the installation process. Understanding basic systemd service management helps with starting, stopping, and monitoring Kafka services effectively.

Step 1: Update AlmaLinux 10 System

System updates ensure you have the latest security patches, bug fixes, and performance improvements. Open your terminal and execute the following command with administrative privileges:

sudo dnf update -y && sudo dnf upgrade -y

The dnf package manager, AlmaLinux 10’s default package management tool, checks for available updates across all installed packages and system components. The -y flag automatically confirms installation prompts, streamlining the update process. This command may take several minutes depending on your internet connection and the number of pending updates.

After updates complete, especially if kernel updates were applied, consider rebooting your system to ensure all changes take effect properly:

sudo reboot

Wait a few minutes for your system to restart before reconnecting and continuing with the installation.

Step 2: Install Java Development Kit (JDK)

Apache Kafka is built on Java and requires a Java Runtime Environment. AlmaLinux 10 provides OpenJDK packages through its official repositories. Install OpenJDK 17, the current Long-Term Support version, using this command:

sudo dnf install java-17-openjdk-devel -y

The java-17-openjdk-devel package includes the complete Java Development Kit with compilation tools and libraries. After installation completes, verify Java is properly configured:

java -version

You should see output indicating OpenJDK version 17 or later. Configure the JAVA_HOME environment variable for optimal compatibility:

echo "export JAVA_HOME=$(dirname $(dirname $(readlink -f $(which java))))" >> ~/.bashrc
echo "export PATH=\$PATH:\$JAVA_HOME/bin" >> ~/.bashrc
source ~/.bashrc

These commands add JAVA_HOME to your bash profile, making it available across terminal sessions. Verify the environment variable:

echo $JAVA_HOME

This should display the Java installation path, typically /usr/lib/jvm/java-17-openjdk.

Step 3: Create Dedicated Kafka User

Running services as root violates security best practices and increases vulnerability to potential exploits. Create a dedicated system user for Kafka operations:

sudo useradd -r -m -d /opt/kafka -s /bin/bash kafka

This command creates a system user named kafka with a home directory at /opt/kafka. The -r flag designates it as a system account, the -m flag creates the home directory, and -s /bin/bash assigns a login shell. System accounts enhance security by limiting privileges and isolating service operations.

Optionally, set a password for the kafka user if you need to perform manual operations:

sudo passwd kafka

However, for automated service management, passwordless operation through sudo is typically sufficient.

Step 4: Download Apache Kafka

Navigate to the Apache Kafka official download page or use wget to download the latest stable release directly. As of this writing, Kafka 3.6.x represents the stable production branch. Download the binary distribution:

cd /tmp
wget https://downloads.apache.org/kafka/3.6.1/kafka_2.13-3.6.1.tgz

The filename format kafka_2.13-3.6.1.tgz indicates Kafka version 3.6.1 compiled with Scala 2.13. Always verify you’re downloading from the official Apache mirror to ensure authenticity and security. For added security, download and verify the checksum:

wget https://downloads.apache.org/kafka/3.6.1/kafka_2.13-3.6.1.tgz.sha512
sha512sum -c kafka_2.13-3.6.1.tgz.sha512

If the checksum matches, you’ll see “OK” confirming file integrity. This verification prevents corrupted or tampered downloads.

Step 5: Extract and Install Kafka

With the download complete, extract the Kafka archive to the kafka user’s home directory:

sudo tar -xzf kafka_2.13-3.6.1.tgz -C /opt/kafka --strip-components=1

The --strip-components=1 flag removes the top-level directory from the archive, placing Kafka files directly in /opt/kafka rather than creating a nested subdirectory. This simplifies path management and configuration.

Set proper ownership so the kafka user can read and modify necessary files:

sudo chown -R kafka:kafka /opt/kafka

Verify the installation by listing the directory contents:

ls -la /opt/kafka

You should see directories including bin (executables), config (configuration files), libs (Java libraries), and logs (log files).

Step 6: Configure ZooKeeper

ZooKeeper manages Kafka cluster metadata, broker coordination, and leader election. Locate the ZooKeeper configuration file:

sudo nano /opt/kafka/config/zookeeper.properties

Review and modify key configuration parameters. The dataDir setting specifies where ZooKeeper stores snapshots and transaction logs:

dataDir=/var/lib/zookeeper

Create this directory with appropriate permissions:

sudo mkdir -p /var/lib/zookeeper
sudo chown -R kafka:kafka /var/lib/zookeeper

The clientPort defines the port ZooKeeper listens on for client connections:

clientPort=2181

Port 2181 is the standard ZooKeeper port. Additional settings like maxClientCnxns=0 remove connection limits, while tickTime=2000 sets the basic time unit in milliseconds for heartbeats and timeouts.

Step 7: Create ZooKeeper Systemd Service

Systemd service files enable automatic startup, monitoring, and management. Create a ZooKeeper service unit:

sudo nano /etc/systemd/system/zookeeper.service

Add the following configuration:

[Unit]
Description=Apache Zookeeper Server
Documentation=http://zookeeper.apache.org
Requires=network.target
After=network.target

[Service]
Type=simple
User=kafka
Group=kafka
Environment="JAVA_HOME=/usr/lib/jvm/java-17-openjdk"
ExecStart=/opt/kafka/bin/zookeeper-server-start.sh /opt/kafka/config/zookeeper.properties
ExecStop=/opt/kafka/bin/zookeeper-server-stop.sh
Restart=on-failure
RestartSec=10
StandardOutput=journal
StandardError=journal

[Install]
WantedBy=multi-user.target

The [Unit] section defines service metadata and dependencies. The [Service] section specifies execution parameters, including the user context, startup command, and restart behavior. The [Install] section determines when the service activates during boot.

Reload systemd to recognize the new service:

sudo systemctl daemon-reload

Step 8: Configure Kafka Broker

Kafka’s main configuration file controls broker behavior, networking, storage, and retention policies. Open the server properties file:

sudo nano /opt/kafka/config/server.properties

Configure essential parameters. The broker.id uniquely identifies each broker in a cluster:

broker.id=0

For single-broker installations, use 0. Multi-broker clusters require unique IDs for each broker.

The listeners parameter defines network interfaces and ports:

listeners=PLAINTEXT://localhost:9092

For remote access, replace localhost with your server’s IP address or use 0.0.0.0 to bind all interfaces.

Configure data storage locations:

log.dirs=/var/kafka-logs

Create this directory with proper ownership:

sudo mkdir -p /var/kafka-logs
sudo chown -R kafka:kafka /var/kafka-logs

Set default partition count for new topics:

num.partitions=3

More partitions enable greater parallelism but consume more resources.

Configure ZooKeeper connection:

zookeeper.connect=localhost:2181

Set data retention policies:

log.retention.hours=168
log.retention.bytes=1073741824

These settings retain data for 168 hours (7 days) or up to 1GB per partition, whichever limit is reached first.

Enable topic deletion capability:

delete.topic.enable=true

Step 9: Create Kafka Systemd Service

Similar to ZooKeeper, create a systemd service for Kafka:

sudo nano /etc/systemd/system/kafka.service

Add this configuration:

[Unit]
Description=Apache Kafka Server
Documentation=http://kafka.apache.org/documentation.html
Requires=zookeeper.service
After=zookeeper.service

[Service]
Type=simple
User=kafka
Group=kafka
Environment="JAVA_HOME=/usr/lib/jvm/java-17-openjdk"
Environment="KAFKA_HEAP_OPTS=-Xmx1G -Xms1G"
ExecStart=/opt/kafka/bin/kafka-server-start.sh /opt/kafka/config/server.properties
ExecStop=/opt/kafka/bin/kafka-server-stop.sh
Restart=on-failure
RestartSec=10
StandardOutput=journal
StandardError=journal

[Install]
WantedBy=multi-user.target

The KAFKA_HEAP_OPTS environment variable allocates JVM heap memory. Adjust the 1G value based on your server’s available RAM. Production systems typically allocate 4-8GB or more.

Reload systemd configuration:

sudo systemctl daemon-reload

Step 10: Start and Enable Services

With configuration complete, start ZooKeeper first, as Kafka depends on it:

sudo systemctl start zookeeper

Enable ZooKeeper to start automatically on boot:

sudo systemctl enable zookeeper

Verify ZooKeeper is running properly:

sudo systemctl status zookeeper

You should see “active (running)” status with green indicators. Now start Kafka:

sudo systemctl start kafka

Enable Kafka for automatic startup:

sudo systemctl enable kafka

Check Kafka service status:

sudo systemctl status kafka

Both services should show active status. If either service fails to start, check logs using sudo journalctl -u zookeeper -n 50 or sudo journalctl -u kafka -n 50 to identify issues.

Step 11: Configure Firewall Rules

AlmaLinux 10 includes firewalld for network security. Open necessary ports for Kafka and ZooKeeper communication:

sudo firewall-cmd --permanent --add-port=2181/tcp
sudo firewall-cmd --permanent --add-port=9092/tcp

Port 2181 enables ZooKeeper client connections, while port 9092 allows Kafka broker communication. Reload the firewall to apply changes:

sudo firewall-cmd --reload

Verify the rules are active:

sudo firewall-cmd --list-ports

For production environments exposed to the internet, consider restricting access to specific IP addresses or implementing VPN-based access.

Testing Apache Kafka Installation

Validate your Kafka installation through practical testing. Navigate to the Kafka bin directory:

cd /opt/kafka/bin

Create a test topic named “test-topic” with 3 partitions and a replication factor of 1:

./kafka-topics.sh --create --topic test-topic --bootstrap-server localhost:9092 --partitions 3 --replication-factor 1

You should see confirmation that the topic was created successfully. List all topics to verify:

./kafka-topics.sh --list --bootstrap-server localhost:9092

Start a console producer to send messages:

./kafka-console-producer.sh --topic test-topic --bootstrap-server localhost:9092

Type several messages, pressing Enter after each. For example:

Hello from AlmaLinux 10
Apache Kafka is running successfully
This is a test message

Press Ctrl+C to exit the producer. Open another terminal and start a console consumer:

cd /opt/kafka/bin
./kafka-console-consumer.sh --topic test-topic --from-beginning --bootstrap-server localhost:9092

Your previously sent messages should appear in the consumer output. This confirms end-to-end functionality. Describe the topic to view configuration details:

./kafka-topics.sh --describe --topic test-topic --bootstrap-server localhost:9092

This displays partition information, leader assignments, and replica locations.

Essential Kafka Commands Reference

Master these commands for daily Kafka operations. Create topics with custom configurations:

./kafka-topics.sh --create --topic my-topic --bootstrap-server localhost:9092 --partitions 5 --replication-factor 1 --config retention.ms=86400000

Delete topics when no longer needed:

./kafka-topics.sh --delete --topic my-topic --bootstrap-server localhost:9092

Alter topic configurations:

./kafka-topics.sh --alter --topic my-topic --bootstrap-server localhost:9092 --partitions 10

List consumer groups to monitor consumption:

./kafka-consumer-groups.sh --list --bootstrap-server localhost:9092

Describe consumer group details including lag:

./kafka-consumer-groups.sh --describe --group my-consumer-group --bootstrap-server localhost:9092

Perform producer performance testing:

./kafka-producer-perf-test.sh --topic perf-test --num-records 100000 --record-size 1000 --throughput -1 --producer-props bootstrap.servers=localhost:9092

These commands provide comprehensive cluster management capabilities.

Kafka Security Hardening on AlmaLinux 10

Production deployments require robust security measures. Implement authentication using SASL (Simple Authentication and Security Layer). Configure SASL/SCRAM by creating users:

./kafka-configs.sh --bootstrap-server localhost:9092 --alter --add-config 'SCRAM-SHA-256=[password=mypassword]' --entity-type users --entity-name kafkauser

Enable SSL/TLS encryption for data in transit. Generate SSL certificates using keytool:

keytool -keystore kafka.server.keystore.jks -alias localhost -keyalg RSA -validity 365 -genkey

Configure server.properties with SSL settings:

listeners=SSL://localhost:9093
ssl.keystore.location=/opt/kafka/config/kafka.server.keystore.jks
ssl.keystore.password=your-keystore-password
ssl.key.password=your-key-password
ssl.truststore.location=/opt/kafka/config/kafka.server.truststore.jks
ssl.truststore.password=your-truststore-password

Implement Access Control Lists (ACLs) for authorization:

./kafka-acls.sh --bootstrap-server localhost:9092 --add --allow-principal User:kafkauser --operation Read --operation Write --topic my-topic

Configure SELinux on AlmaLinux 10 for enhanced security. Check SELinux status:

sudo getenforce

If SELinux is enforcing, create appropriate policies or set permissive mode for Kafka directories during initial setup.

Performance Optimization Tips

Optimize Kafka performance for AlmaLinux 10 environments. Tune JVM heap size based on available memory:

export KAFKA_HEAP_OPTS="-Xmx4G -Xms4G"

Allocate 25-50% of system RAM to Kafka, leaving remainder for page cache.

Adjust OS-level parameters for optimal performance. Increase file descriptor limits by editing /etc/security/limits.conf:

kafka soft nofile 100000
kafka hard nofile 100000

Configure VM swappiness to minimize swapping:

sudo sysctl vm.swappiness=1

Optimize Kafka broker settings in server.properties:

num.network.threads=8
num.io.threads=16
socket.send.buffer.bytes=102400
socket.receive.buffer.bytes=102400
socket.request.max.bytes=104857600

Use SSD storage for Kafka logs to maximize I/O throughput. Configure appropriate retention policies balancing storage costs and data requirements.

Monitoring and Logging

Effective monitoring ensures reliable Kafka operations. View real-time Kafka logs using journalctl:

sudo journalctl -u kafka -f

The -f flag follows log output continuously. Check ZooKeeper logs similarly:

sudo journalctl -u zookeeper -f

Kafka exposes JMX metrics for comprehensive monitoring. Enable JMX in kafka.service by adding:

Environment="JMX_PORT=9999"

Integrate monitoring tools like Prometheus and Grafana. Install JMX Exporter to expose Kafka metrics in Prometheus format. Download and configure the exporter, then visualize metrics using Grafana dashboards.

Monitor critical metrics including message throughput, consumer lag, disk usage, network I/O, and broker availability. Set up alerts for unusual patterns indicating potential issues.

Common Troubleshooting Issues

Address frequent installation and operational challenges. If services fail to start, check Java installation and JAVA_HOME configuration. Verify no port conflicts exist:

sudo netstat -tuln | grep 9092

Connection refused errors typically indicate firewall blocking or incorrect listener configuration. Review server.properties listeners parameter and firewall rules.

ZooKeeper connection failures manifest as repeated connection timeout messages. Verify ZooKeeper is running and accessible:

echo stat | nc localhost 2181

Out of memory errors require JVM heap adjustment. Monitor memory usage and increase KAFKA_HEAP_OPTS allocation. Investigate memory leaks if issues persist despite adequate allocation.

Disk space issues arise from insufficient retention policy configuration. Monitor disk usage regularly:

df -h /var/kafka-logs

Adjust retention hours or bytes in server.properties based on available storage.

Permission denied errors indicate ownership or SELinux issues. Verify kafka user owns all necessary directories and files. Check SELinux denials:

sudo ausearch -m avc -ts recent

Upgrading and Maintenance

Plan upgrades carefully to minimize downtime. Before upgrading, backup critical data and configurations:

sudo tar -czf kafka-backup-$(date +%Y%m%d).tar.gz /opt/kafka/config /var/kafka-logs

For rolling upgrades in multi-broker clusters, upgrade one broker at a time. Stop the broker, replace binaries, update configurations if needed, and restart. Monitor cluster health before proceeding to the next broker.

AlmaLinux 10 provides long-term support with regular security updates. Keep the operating system current:

sudo dnf update -y

Review Kafka release notes before upgrading to understand breaking changes and new features. Test upgrades in non-production environments first.

Congratulations! You have successfully installed Apache Kafka. Thanks for using this tutorial for installing Apache Kafka distributed streaming platform on your AlmaLinux OS 10 system. For additional help or useful information, we recommend you check the official Apache website.

VPS Manage Service Offer
If you don’t have time to do all of this stuff, or if this is not your area of expertise, we offer a service to do “VPS Manage Service Offer”, starting from $10 (Paypal payment). Please contact us to get the best deal!

r00t

r00t is an experienced Linux enthusiast and technical writer with a passion for open-source software. With years of hands-on experience in various Linux distributions, r00t has developed a deep understanding of the Linux ecosystem and its powerful tools. He holds certifications in SCE and has contributed to several open-source projects. r00t is dedicated to sharing her knowledge and expertise through well-researched and informative articles, helping others navigate the world of Linux with confidence.
Back to top button