How To Install Apache Kafka on AlmaLinux 10

Apache Kafka has revolutionized how organizations handle real-time data streaming and event-driven architectures. As a distributed streaming platform, it processes millions of events per second while maintaining exceptional reliability and fault tolerance. AlmaLinux 10, the latest RHEL-compatible enterprise Linux distribution, provides an ideal foundation for deploying Kafka with enhanced security features, extended hardware support, and long-term stability. This comprehensive guide walks you through every step of installing and configuring Apache Kafka on AlmaLinux 10, from initial system preparation to production-ready deployment.
Whether you’re building real-time data pipelines, implementing log aggregation systems, or developing event-driven microservices, mastering Kafka installation on AlmaLinux 10 positions you for success. System administrators, DevOps engineers, and developers will find actionable instructions, security best practices, and troubleshooting strategies to ensure a smooth deployment.
What is Apache Kafka?
Apache Kafka is an open-source distributed event streaming platform designed for high-throughput, low-latency data processing. Originally developed by LinkedIn and later donated to the Apache Software Foundation, Kafka has become the de facto standard for real-time data streaming across industries. It excels at handling massive volumes of data streams, processing them reliably, and making them available to multiple consumers simultaneously.
Organizations leverage Kafka for various mission-critical applications. Log aggregation systems collect and centralize logs from multiple services. Metrics collection platforms gather telemetry data from distributed systems. Stream processing applications perform real-time analytics on flowing data. Message queuing systems decouple microservices architectures. Companies like Netflix, Uber, LinkedIn, and Twitter rely on Kafka to power their data infrastructure, processing trillions of messages daily.
The platform’s key benefits include exceptional scalability through horizontal scaling, fault tolerance via data replication, durability through persistent storage, and impressive throughput measured in millions of messages per second. These capabilities make Kafka indispensable for modern data-driven organizations requiring real-time insights and reliable data delivery.
Understanding Apache Kafka Architecture
Kafka’s distributed architecture comprises several interconnected components working together seamlessly. Brokers serve as the core servers that store, manage, and serve data to clients. Each broker handles thousands of read and write operations per second, maintaining data partitions and coordinating with other brokers.
Topics and Partitions organize data streams logically. Topics represent categories or feed names where records are published. Each topic divides into multiple partitions, enabling parallel processing and scalability. Partitions distribute across broker nodes, providing redundancy and load distribution. This partitioning strategy allows Kafka to handle massive data volumes efficiently.
Producers are client applications that publish data streams to Kafka topics. They decide which partition receives each message, either through explicit specification or automatic distribution based on message keys. Producers can batch messages for efficiency and configure acknowledgment levels for reliability.
Consumers and Consumer Groups read data from Kafka topics. Individual consumers subscribe to topics and process messages sequentially. Consumer groups enable parallel processing where multiple consumers coordinate to divide partition consumption, ensuring each message is processed exactly once within the group.
Kafka historically relied on ZooKeeper for cluster coordination, metadata management, and leader election. However, Kafka 2.8 introduced KRaft mode (Kafka Raft), which eliminates ZooKeeper dependency by implementing consensus directly within Kafka. While this guide uses the traditional ZooKeeper approach for compatibility, KRaft represents Kafka’s future architecture.
Prerequisites for Installing Kafka on AlmaLinux 10
Before proceeding with installation, ensure your environment meets these requirements. You’ll need an AlmaLinux 10 server with minimum specifications of 2GB RAM, 2 CPU cores, and 20GB available storage. Production environments should provision significantly more resources based on expected throughput and retention requirements.
Administrative access through root or sudo privileges is essential for installing packages, creating system users, and configuring services. Network connectivity enables downloading Kafka binaries and dependencies from official repositories. Basic familiarity with Linux command-line operations, text editors like vim or nano, and firewall configuration will streamline the installation process.
Apache Kafka requires Java Development Kit (JDK) version 11 or later. Verify your Java installation by running java -version in the terminal. If Java isn’t installed, you’ll add it during the installation process. Understanding basic systemd service management helps with starting, stopping, and monitoring Kafka services effectively.
Step 1: Update AlmaLinux 10 System
System updates ensure you have the latest security patches, bug fixes, and performance improvements. Open your terminal and execute the following command with administrative privileges:
sudo dnf update -y && sudo dnf upgrade -y
The dnf package manager, AlmaLinux 10’s default package management tool, checks for available updates across all installed packages and system components. The -y flag automatically confirms installation prompts, streamlining the update process. This command may take several minutes depending on your internet connection and the number of pending updates.
After updates complete, especially if kernel updates were applied, consider rebooting your system to ensure all changes take effect properly:
sudo reboot
Wait a few minutes for your system to restart before reconnecting and continuing with the installation.
Step 2: Install Java Development Kit (JDK)
Apache Kafka is built on Java and requires a Java Runtime Environment. AlmaLinux 10 provides OpenJDK packages through its official repositories. Install OpenJDK 17, the current Long-Term Support version, using this command:
sudo dnf install java-17-openjdk-devel -y
The java-17-openjdk-devel package includes the complete Java Development Kit with compilation tools and libraries. After installation completes, verify Java is properly configured:
java -version
You should see output indicating OpenJDK version 17 or later. Configure the JAVA_HOME environment variable for optimal compatibility:
echo "export JAVA_HOME=$(dirname $(dirname $(readlink -f $(which java))))" >> ~/.bashrc
echo "export PATH=\$PATH:\$JAVA_HOME/bin" >> ~/.bashrc
source ~/.bashrc
These commands add JAVA_HOME to your bash profile, making it available across terminal sessions. Verify the environment variable:
echo $JAVA_HOME
This should display the Java installation path, typically /usr/lib/jvm/java-17-openjdk.
Step 3: Create Dedicated Kafka User
Running services as root violates security best practices and increases vulnerability to potential exploits. Create a dedicated system user for Kafka operations:
sudo useradd -r -m -d /opt/kafka -s /bin/bash kafka
This command creates a system user named kafka with a home directory at /opt/kafka. The -r flag designates it as a system account, the -m flag creates the home directory, and -s /bin/bash assigns a login shell. System accounts enhance security by limiting privileges and isolating service operations.
Optionally, set a password for the kafka user if you need to perform manual operations:
sudo passwd kafka
However, for automated service management, passwordless operation through sudo is typically sufficient.
Step 4: Download Apache Kafka
Navigate to the Apache Kafka official download page or use wget to download the latest stable release directly. As of this writing, Kafka 3.6.x represents the stable production branch. Download the binary distribution:
cd /tmp
wget https://downloads.apache.org/kafka/3.6.1/kafka_2.13-3.6.1.tgz
The filename format kafka_2.13-3.6.1.tgz indicates Kafka version 3.6.1 compiled with Scala 2.13. Always verify you’re downloading from the official Apache mirror to ensure authenticity and security. For added security, download and verify the checksum:
wget https://downloads.apache.org/kafka/3.6.1/kafka_2.13-3.6.1.tgz.sha512
sha512sum -c kafka_2.13-3.6.1.tgz.sha512
If the checksum matches, you’ll see “OK” confirming file integrity. This verification prevents corrupted or tampered downloads.
Step 5: Extract and Install Kafka
With the download complete, extract the Kafka archive to the kafka user’s home directory:
sudo tar -xzf kafka_2.13-3.6.1.tgz -C /opt/kafka --strip-components=1
The --strip-components=1 flag removes the top-level directory from the archive, placing Kafka files directly in /opt/kafka rather than creating a nested subdirectory. This simplifies path management and configuration.
Set proper ownership so the kafka user can read and modify necessary files:
sudo chown -R kafka:kafka /opt/kafka
Verify the installation by listing the directory contents:
ls -la /opt/kafka
You should see directories including bin (executables), config (configuration files), libs (Java libraries), and logs (log files).
Step 6: Configure ZooKeeper
ZooKeeper manages Kafka cluster metadata, broker coordination, and leader election. Locate the ZooKeeper configuration file:
sudo nano /opt/kafka/config/zookeeper.properties
Review and modify key configuration parameters. The dataDir setting specifies where ZooKeeper stores snapshots and transaction logs:
dataDir=/var/lib/zookeeper
Create this directory with appropriate permissions:
sudo mkdir -p /var/lib/zookeeper
sudo chown -R kafka:kafka /var/lib/zookeeper
The clientPort defines the port ZooKeeper listens on for client connections:
clientPort=2181
Port 2181 is the standard ZooKeeper port. Additional settings like maxClientCnxns=0 remove connection limits, while tickTime=2000 sets the basic time unit in milliseconds for heartbeats and timeouts.
Step 7: Create ZooKeeper Systemd Service
Systemd service files enable automatic startup, monitoring, and management. Create a ZooKeeper service unit:
sudo nano /etc/systemd/system/zookeeper.service
Add the following configuration:
[Unit]
Description=Apache Zookeeper Server
Documentation=http://zookeeper.apache.org
Requires=network.target
After=network.target
[Service]
Type=simple
User=kafka
Group=kafka
Environment="JAVA_HOME=/usr/lib/jvm/java-17-openjdk"
ExecStart=/opt/kafka/bin/zookeeper-server-start.sh /opt/kafka/config/zookeeper.properties
ExecStop=/opt/kafka/bin/zookeeper-server-stop.sh
Restart=on-failure
RestartSec=10
StandardOutput=journal
StandardError=journal
[Install]
WantedBy=multi-user.target
The [Unit] section defines service metadata and dependencies. The [Service] section specifies execution parameters, including the user context, startup command, and restart behavior. The [Install] section determines when the service activates during boot.
Reload systemd to recognize the new service:
sudo systemctl daemon-reload
Step 8: Configure Kafka Broker
Kafka’s main configuration file controls broker behavior, networking, storage, and retention policies. Open the server properties file:
sudo nano /opt/kafka/config/server.properties
Configure essential parameters. The broker.id uniquely identifies each broker in a cluster:
broker.id=0
For single-broker installations, use 0. Multi-broker clusters require unique IDs for each broker.
The listeners parameter defines network interfaces and ports:
listeners=PLAINTEXT://localhost:9092
For remote access, replace localhost with your server’s IP address or use 0.0.0.0 to bind all interfaces.
Configure data storage locations:
log.dirs=/var/kafka-logs
Create this directory with proper ownership:
sudo mkdir -p /var/kafka-logs
sudo chown -R kafka:kafka /var/kafka-logs
Set default partition count for new topics:
num.partitions=3
More partitions enable greater parallelism but consume more resources.
Configure ZooKeeper connection:
zookeeper.connect=localhost:2181
Set data retention policies:
log.retention.hours=168
log.retention.bytes=1073741824
These settings retain data for 168 hours (7 days) or up to 1GB per partition, whichever limit is reached first.
Enable topic deletion capability:
delete.topic.enable=true
Step 9: Create Kafka Systemd Service
Similar to ZooKeeper, create a systemd service for Kafka:
sudo nano /etc/systemd/system/kafka.service
Add this configuration:
[Unit]
Description=Apache Kafka Server
Documentation=http://kafka.apache.org/documentation.html
Requires=zookeeper.service
After=zookeeper.service
[Service]
Type=simple
User=kafka
Group=kafka
Environment="JAVA_HOME=/usr/lib/jvm/java-17-openjdk"
Environment="KAFKA_HEAP_OPTS=-Xmx1G -Xms1G"
ExecStart=/opt/kafka/bin/kafka-server-start.sh /opt/kafka/config/server.properties
ExecStop=/opt/kafka/bin/kafka-server-stop.sh
Restart=on-failure
RestartSec=10
StandardOutput=journal
StandardError=journal
[Install]
WantedBy=multi-user.target
The KAFKA_HEAP_OPTS environment variable allocates JVM heap memory. Adjust the 1G value based on your server’s available RAM. Production systems typically allocate 4-8GB or more.
Reload systemd configuration:
sudo systemctl daemon-reload
Step 10: Start and Enable Services
With configuration complete, start ZooKeeper first, as Kafka depends on it:
sudo systemctl start zookeeper
Enable ZooKeeper to start automatically on boot:
sudo systemctl enable zookeeper
Verify ZooKeeper is running properly:
sudo systemctl status zookeeper
You should see “active (running)” status with green indicators. Now start Kafka:
sudo systemctl start kafka
Enable Kafka for automatic startup:
sudo systemctl enable kafka
Check Kafka service status:
sudo systemctl status kafka
Both services should show active status. If either service fails to start, check logs using sudo journalctl -u zookeeper -n 50 or sudo journalctl -u kafka -n 50 to identify issues.
Step 11: Configure Firewall Rules
AlmaLinux 10 includes firewalld for network security. Open necessary ports for Kafka and ZooKeeper communication:
sudo firewall-cmd --permanent --add-port=2181/tcp
sudo firewall-cmd --permanent --add-port=9092/tcp
Port 2181 enables ZooKeeper client connections, while port 9092 allows Kafka broker communication. Reload the firewall to apply changes:
sudo firewall-cmd --reload
Verify the rules are active:
sudo firewall-cmd --list-ports
For production environments exposed to the internet, consider restricting access to specific IP addresses or implementing VPN-based access.
Testing Apache Kafka Installation
Validate your Kafka installation through practical testing. Navigate to the Kafka bin directory:
cd /opt/kafka/bin
Create a test topic named “test-topic” with 3 partitions and a replication factor of 1:
./kafka-topics.sh --create --topic test-topic --bootstrap-server localhost:9092 --partitions 3 --replication-factor 1
You should see confirmation that the topic was created successfully. List all topics to verify:
./kafka-topics.sh --list --bootstrap-server localhost:9092
Start a console producer to send messages:
./kafka-console-producer.sh --topic test-topic --bootstrap-server localhost:9092
Type several messages, pressing Enter after each. For example:
Hello from AlmaLinux 10
Apache Kafka is running successfully
This is a test message
Press Ctrl+C to exit the producer. Open another terminal and start a console consumer:
cd /opt/kafka/bin
./kafka-console-consumer.sh --topic test-topic --from-beginning --bootstrap-server localhost:9092
Your previously sent messages should appear in the consumer output. This confirms end-to-end functionality. Describe the topic to view configuration details:
./kafka-topics.sh --describe --topic test-topic --bootstrap-server localhost:9092
This displays partition information, leader assignments, and replica locations.
Essential Kafka Commands Reference
Master these commands for daily Kafka operations. Create topics with custom configurations:
./kafka-topics.sh --create --topic my-topic --bootstrap-server localhost:9092 --partitions 5 --replication-factor 1 --config retention.ms=86400000
Delete topics when no longer needed:
./kafka-topics.sh --delete --topic my-topic --bootstrap-server localhost:9092
Alter topic configurations:
./kafka-topics.sh --alter --topic my-topic --bootstrap-server localhost:9092 --partitions 10
List consumer groups to monitor consumption:
./kafka-consumer-groups.sh --list --bootstrap-server localhost:9092
Describe consumer group details including lag:
./kafka-consumer-groups.sh --describe --group my-consumer-group --bootstrap-server localhost:9092
Perform producer performance testing:
./kafka-producer-perf-test.sh --topic perf-test --num-records 100000 --record-size 1000 --throughput -1 --producer-props bootstrap.servers=localhost:9092
These commands provide comprehensive cluster management capabilities.
Kafka Security Hardening on AlmaLinux 10
Production deployments require robust security measures. Implement authentication using SASL (Simple Authentication and Security Layer). Configure SASL/SCRAM by creating users:
./kafka-configs.sh --bootstrap-server localhost:9092 --alter --add-config 'SCRAM-SHA-256=[password=mypassword]' --entity-type users --entity-name kafkauser
Enable SSL/TLS encryption for data in transit. Generate SSL certificates using keytool:
keytool -keystore kafka.server.keystore.jks -alias localhost -keyalg RSA -validity 365 -genkey
Configure server.properties with SSL settings:
listeners=SSL://localhost:9093
ssl.keystore.location=/opt/kafka/config/kafka.server.keystore.jks
ssl.keystore.password=your-keystore-password
ssl.key.password=your-key-password
ssl.truststore.location=/opt/kafka/config/kafka.server.truststore.jks
ssl.truststore.password=your-truststore-password
Implement Access Control Lists (ACLs) for authorization:
./kafka-acls.sh --bootstrap-server localhost:9092 --add --allow-principal User:kafkauser --operation Read --operation Write --topic my-topic
Configure SELinux on AlmaLinux 10 for enhanced security. Check SELinux status:
sudo getenforce
If SELinux is enforcing, create appropriate policies or set permissive mode for Kafka directories during initial setup.
Performance Optimization Tips
Optimize Kafka performance for AlmaLinux 10 environments. Tune JVM heap size based on available memory:
export KAFKA_HEAP_OPTS="-Xmx4G -Xms4G"
Allocate 25-50% of system RAM to Kafka, leaving remainder for page cache.
Adjust OS-level parameters for optimal performance. Increase file descriptor limits by editing /etc/security/limits.conf:
kafka soft nofile 100000
kafka hard nofile 100000
Configure VM swappiness to minimize swapping:
sudo sysctl vm.swappiness=1
Optimize Kafka broker settings in server.properties:
num.network.threads=8
num.io.threads=16
socket.send.buffer.bytes=102400
socket.receive.buffer.bytes=102400
socket.request.max.bytes=104857600
Use SSD storage for Kafka logs to maximize I/O throughput. Configure appropriate retention policies balancing storage costs and data requirements.
Monitoring and Logging
Effective monitoring ensures reliable Kafka operations. View real-time Kafka logs using journalctl:
sudo journalctl -u kafka -f
The -f flag follows log output continuously. Check ZooKeeper logs similarly:
sudo journalctl -u zookeeper -f
Kafka exposes JMX metrics for comprehensive monitoring. Enable JMX in kafka.service by adding:
Environment="JMX_PORT=9999"
Integrate monitoring tools like Prometheus and Grafana. Install JMX Exporter to expose Kafka metrics in Prometheus format. Download and configure the exporter, then visualize metrics using Grafana dashboards.
Monitor critical metrics including message throughput, consumer lag, disk usage, network I/O, and broker availability. Set up alerts for unusual patterns indicating potential issues.
Common Troubleshooting Issues
Address frequent installation and operational challenges. If services fail to start, check Java installation and JAVA_HOME configuration. Verify no port conflicts exist:
sudo netstat -tuln | grep 9092
Connection refused errors typically indicate firewall blocking or incorrect listener configuration. Review server.properties listeners parameter and firewall rules.
ZooKeeper connection failures manifest as repeated connection timeout messages. Verify ZooKeeper is running and accessible:
echo stat | nc localhost 2181
Out of memory errors require JVM heap adjustment. Monitor memory usage and increase KAFKA_HEAP_OPTS allocation. Investigate memory leaks if issues persist despite adequate allocation.
Disk space issues arise from insufficient retention policy configuration. Monitor disk usage regularly:
df -h /var/kafka-logs
Adjust retention hours or bytes in server.properties based on available storage.
Permission denied errors indicate ownership or SELinux issues. Verify kafka user owns all necessary directories and files. Check SELinux denials:
sudo ausearch -m avc -ts recent
Upgrading and Maintenance
Plan upgrades carefully to minimize downtime. Before upgrading, backup critical data and configurations:
sudo tar -czf kafka-backup-$(date +%Y%m%d).tar.gz /opt/kafka/config /var/kafka-logs
For rolling upgrades in multi-broker clusters, upgrade one broker at a time. Stop the broker, replace binaries, update configurations if needed, and restart. Monitor cluster health before proceeding to the next broker.
AlmaLinux 10 provides long-term support with regular security updates. Keep the operating system current:
sudo dnf update -y
Review Kafka release notes before upgrading to understand breaking changes and new features. Test upgrades in non-production environments first.
Congratulations! You have successfully installed Apache Kafka. Thanks for using this tutorial for installing Apache Kafka distributed streaming platform on your AlmaLinux OS 10 system. For additional help or useful information, we recommend you check the official Apache website.