Arch Linux BasedManjaro

How To Install Apache Cassandra on Manjaro

Install Apache Cassandra on Manjaro

Apache Cassandra stands as one of the most powerful distributed NoSQL databases available today, offering exceptional scalability and fault tolerance for modern applications. Installing Cassandra on Manjaro Linux provides developers and system administrators with a robust platform for handling large-scale data operations. This comprehensive guide walks you through multiple installation methods, ensuring you can deploy Cassandra successfully regardless of your specific requirements.

Manjaro’s rolling release model and Arch-based architecture make it an excellent choice for database deployments. The distribution’s package management system and community support create an ideal environment for running enterprise-grade database systems like Apache Cassandra.

Understanding Apache Cassandra

What is Apache Cassandra

Apache Cassandra represents a distributed wide-column store NoSQL database designed specifically for handling massive amounts of data across multiple servers with no single point of failure. Unlike traditional relational databases, Cassandra employs a peer-to-peer distributed architecture that ensures continuous availability and linear scalability.

The database excels in scenarios requiring high availability and can handle thousands of write operations per second while maintaining consistent read performance. Organizations like Netflix, Apple, and Instagram rely on Cassandra for their mission-critical applications, demonstrating its enterprise-grade reliability.

Cassandra Architecture Fundamentals

Cassandra’s ring-based architecture distributes data across multiple nodes using consistent hashing. Each node communicates with others through a gossip protocol, ensuring cluster-wide awareness without requiring a central coordinator. This design eliminates single points of failure while providing automatic data distribution and replication.

The eventual consistency model allows Cassandra to prioritize availability and partition tolerance, making it ideal for applications that can tolerate brief consistency delays in exchange for guaranteed uptime. Data replication occurs automatically across configurable numbers of nodes, ensuring data durability and high availability.

System Requirements and Prerequisites

Hardware Requirements

Before installing Apache Cassandra on Manjaro, ensure your system meets the minimum hardware specifications. Memory requirements start at 4GB RAM for basic installations, though production deployments typically require 8-16GB or more. The database performs significantly better with increased memory allocation, as it relies heavily on RAM for caching and performance optimization.

CPU specifications should include at least 2 cores for minimal installations, with 8 cores representing the optimal balance between performance and cost for most deployments. Cassandra’s highly concurrent architecture benefits substantially from additional CPU cores, as both read and write operations are CPU-intensive processes.

Storage considerations require at least 10GB of available disk space for installation and basic operation. Production environments should provision separate disks for the commit log directory and data file directories to optimize performance. Solid-state drives significantly improve performance, though traditional spinning drives work adequately when properly configured.

Software Prerequisites

Java installation represents the most critical prerequisite for Apache Cassandra. The database requires Oracle JDK 8, OpenJDK 8, or OpenJDK 11 for proper operation. Manjaro typically includes OpenJDK in its repositories, making installation straightforward through the package manager.

Python compatibility ensures proper operation of Cassandra’s administrative tools, particularly the CQL shell (cqlsh). Most modern Linux distributions, including Manjaro, include Python by default, though verifying the installation prevents potential issues during setup.

Pre-installation System Preparation

Update your Manjaro system before beginning the installation process:

sudo pacman -Syu

Install essential development tools and dependencies:

sudo pacman -S base-devel wget curl

Verify Java installation status:

java -version

If Java isn’t installed, install OpenJDK:

sudo pacman -S jdk11-openjdk

Configure the JAVA_HOME environment variable:

echo 'export JAVA_HOME=/usr/lib/jvm/java-11-openjdk' >> ~/.bashrc
source ~/.bashrc

Installation Methods Overview

Available Installation Approaches

Manjaro offers several installation methods for Apache Cassandra, each suited to different use cases and experience levels. The AUR (Arch User Repository) method provides the most seamless integration with Manjaro’s package management system, while Docker installation offers containerized deployment benefits.

Manual tarball installation gives complete control over the installation process and file locations, making it ideal for custom deployments or specific version requirements. The Snap package method provides universal package compatibility across different Linux distributions.

Method Comparison Analysis

The AUR installation approach integrates perfectly with Manjaro’s pacman package manager, providing automatic dependency resolution and system-wide service integration. This method works best for users comfortable with AUR tools and wanting standard system integration.

Docker deployment offers isolation benefits and simplified version management, making it excellent for development environments or containerized production deployments. Docker installation requires minimal system changes and allows running multiple Cassandra versions simultaneously.

Tarball installation provides maximum flexibility and control over configuration options, though it requires more manual setup and maintenance. This approach suits advanced users needing specific configurations or custom deployment scenarios.

Method 1: Installing via AUR Package

Preparing the AUR Environment

Begin by installing an AUR helper if not already present on your system. Yay represents the most popular choice:

sudo pacman -S --needed git base-devel
git clone https://aur.archlinux.org/yay.git
cd yay
makepkg -si
cd .. && rm -rf yay

Alternatively, install paru for enhanced AUR management:

sudo pacman -S --needed git base-devel
git clone https://aur.archlinux.org/paru.git
cd paru
makepkg -si
cd .. && rm -rf paru

Installing Java Prerequisites

Install OpenJDK if not already present:

sudo pacman -S jdk11-openjdk

Configure Java environment variables permanently:

echo 'export JAVA_HOME=/usr/lib/jvm/java-11-openjdk' | sudo tee -a /etc/environment
echo 'export PATH=$PATH:$JAVA_HOME/bin' | sudo tee -a /etc/environment

Verify Java installation and configuration:

java -version
echo $JAVA_HOME

Installing Cassandra from AUR

Search for available Cassandra packages in the AUR:

yay -Ss cassandra

Install the Apache Cassandra package:

yay -S cassandra

The installation process automatically downloads source code, resolves dependencies, and compiles the package. This process may take several minutes depending on your system’s performance.

Initial Configuration and Service Setup

Enable the Cassandra service for automatic startup:

sudo systemctl enable cassandra.service

Start the Cassandra service:

sudo systemctl start cassandra.service

Check service status to ensure proper startup:

sudo systemctl status cassandra.service

The AUR package automatically configures systemd integration, providing seamless service management through standard Linux service commands.

Method 2: Docker Installation

Docker Prerequisites and Setup

Install Docker on your Manjaro system:

sudo pacman -S docker docker-compose

Enable and start the Docker service:

sudo systemctl enable docker.service
sudo systemctl start docker.service

Add your user to the docker group to avoid requiring sudo for Docker commands:

sudo usermod -aG docker $USER

Log out and back in for group changes to take effect, then verify Docker installation:

docker --version
docker run hello-world

Pulling and Configuring Cassandra Image

Pull the official Apache Cassandra Docker image:

docker pull cassandra:latest

For specific version requirements, specify the version tag:

docker pull cassandra:4.0

Create a dedicated network for Cassandra containers:

docker network create cassandra-network

Running Cassandra Container

Launch a Cassandra container with persistent data storage:

docker run -d \
  --name cassandra-node1 \
  --network cassandra-network \
  -p 9042:9042 \
  -p 7000:7000 \
  -v cassandra-data:/var/lib/cassandra \
  -e CASSANDRA_CLUSTER_NAME='ManjoroCassandra' \
  cassandra:latest

Monitor container startup logs:

docker logs -f cassandra-node1

Wait for the “Startup complete” message before proceeding with configuration or connections.

Container Management and Access

Access the Cassandra container’s CQL shell:

docker exec -it cassandra-node1 cqlsh

Stop the container when needed:

docker stop cassandra-node1

Start the container after system reboot:

docker start cassandra-node1

Create a docker-compose file for easier management:

version: '3.8'
services:
  cassandra:
    image: cassandra:latest
    container_name: cassandra-node1
    ports:
      - "9042:9042"
      - "7000:7000"
    volumes:
      - cassandra-data:/var/lib/cassandra
    environment:
      - CASSANDRA_CLUSTER_NAME=ManjoroCassandra
    networks:
      - cassandra-network

volumes:
  cassandra-data:

networks:
  cassandra-network:

Method 3: Manual Tarball Installation

Downloading and Preparing Cassandra

Navigate to the Apache Cassandra download directory:

cd /opt

Download the latest stable release:

sudo wget https://archive.apache.org/dist/cassandra/4.1.3/apache-cassandra-4.1.3-bin.tar.gz

Verify the download integrity using checksums provided on the Apache website:

sudo wget https://archive.apache.org/dist/cassandra/4.1.3/apache-cassandra-4.1.3-bin.tar.gz.sha256
sudo sha256sum -c apache-cassandra-4.1.3-bin.tar.gz.sha256

Extraction and Directory Setup

Extract the downloaded tarball:

sudo tar -xzf apache-cassandra-4.1.3-bin.tar.gz

Create a symbolic link for easier management:

sudo ln -s apache-cassandra-4.1.3 cassandra

Set appropriate ownership and permissions:

sudo useradd -r -s /bin/false cassandra
sudo chown -R cassandra:cassandra /opt/apache-cassandra-4.1.3
sudo chown -h cassandra:cassandra /opt/cassandra

Environment Configuration

Configure environment variables system-wide:

echo 'export CASSANDRA_HOME=/opt/cassandra' | sudo tee -a /etc/environment
echo 'export PATH=$PATH:$CASSANDRA_HOME/bin' | sudo tee -a /etc/environment

Create a dedicated environment file for Cassandra:

sudo tee /etc/profile.d/cassandra.sh << EOF
#!/bin/bash
export CASSANDRA_HOME=/opt/cassandra
export PATH=\$PATH:\$CASSANDRA_HOME/bin
EOF

sudo chmod +x /etc/profile.d/cassandra.sh

Manual Service Configuration

Create a systemd service file for Cassandra:

sudo tee /etc/systemd/system/cassandra.service << EOF
[Unit]
Description=Apache Cassandra
After=network.target

[Service]
Type=forking
User=cassandra
Group=cassandra
ExecStart=/opt/cassandra/bin/cassandra -p /var/run/cassandra/cassandra.pid
ExecStop=/bin/kill -TERM \$MAINPID
PIDFile=/var/run/cassandra/cassandra.pid
Restart=always

[Install]
WantedBy=multi-user.target
EOF

Create necessary directories:

sudo mkdir -p /var/run/cassandra
sudo mkdir -p /var/log/cassandra
sudo chown cassandra:cassandra /var/run/cassandra
sudo chown cassandra:cassandra /var/log/cassandra

Enable and start the service:

sudo systemctl daemon-reload
sudo systemctl enable cassandra.service
sudo systemctl start cassandra.service

Post-Installation Configuration

Primary Configuration File Management

Locate and examine the main configuration file:

sudo find / -name "cassandra.yaml" 2>/dev/null

The configuration file typically resides at /etc/cassandra/cassandra.yaml for package installations or $CASSANDRA_HOME/conf/cassandra.yaml for manual installations.

Key configuration parameters include cluster name, listen addresses, and replication settings. Edit the configuration file:

sudo nano /etc/cassandra/cassandra.yaml

Modify essential settings:

cluster_name: 'ManjaroCassandraCluster'
listen_address: localhost
rpc_address: localhost
seed_provider:
  - class_name: org.apache.cassandra.locator.SimpleSeedProvider
    parameters:
      - seeds: "127.0.0.1"

Network and Security Configuration

Configure network binding for your specific environment. For local development, localhost settings work adequately, while production deployments require specific IP address binding.

Authentication setup enhances security for production environments:

authenticator: org.apache.cassandra.auth.PasswordAuthenticator
authorizer: org.apache.cassandra.auth.CassandraAuthorizer

Firewall configuration requires opening specific ports for Cassandra operation:

sudo ufw allow 7000/tcp  # Inter-node communication
sudo ufw allow 9042/tcp  # CQL client port
sudo ufw allow 7199/tcp  # JMX monitoring

Memory and Performance Optimization

Configure JVM heap settings in the jvm.options file:

sudo nano /etc/cassandra/jvm.options

Set appropriate heap sizes based on available system memory:

-Xms2G
-Xmx2G

For systems with more than 8GB RAM, allocate 25-50% of total memory to Cassandra heap space. Configure garbage collection settings for optimal performance:

-XX:+UseG1GC
-XX:G1RSetUpdatingPauseTimePercent=5
-XX:MaxGCPauseMillis=300

Starting and Managing Cassandra Service

Service Management Operations

Start the Cassandra service using systemctl:

sudo systemctl start cassandra.service

Monitor service status to ensure proper startup:

sudo systemctl status cassandra.service

Enable automatic startup on system boot:

sudo systemctl enable cassandra.service

Service logs provide valuable troubleshooting information:

sudo journalctl -u cassandra.service -f

Verification and Health Checks

Use the nodetool status command to verify cluster health:

nodetool status

Successful output displays node status as “UN” (Up/Normal), indicating proper operation. Check cluster information:

nodetool info

Network connectivity testing ensures proper client access:

telnet localhost 9042

Initial Database Operations

Access the CQL shell for database operations:

cqlsh localhost

Create your first keyspace with appropriate replication strategy:

CREATE KEYSPACE test_keyspace WITH replication = {
  'class': 'SimpleStrategy',
  'replication_factor': 1
};

Use the keyspace and create a simple table:

USE test_keyspace;

CREATE TABLE users (
    user_id UUID PRIMARY KEY,
    username TEXT,
    email TEXT
);

Insert test data to verify functionality:

INSERT INTO users (user_id, username, email) 
VALUES (uuid(), 'testuser', 'test@example.com');

Query the data to confirm proper operation:

SELECT * FROM users;

Troubleshooting Common Issues

Installation-Related Problems

Java compatibility issues represent the most common installation problems. Verify Java version compatibility:

java -version

Cassandra requires Java 8 or 11. Install the correct version if needed:

sudo pacman -S jdk11-openjdk

Permission errors during AUR installation often stem from incorrect user permissions or missing development tools:

sudo pacman -S --needed base-devel git

Build failures in AUR packages may require clearing package cache:

yay -Sc
yay -S cassandra --noconfirm

Service Startup Issues

Port conflicts prevent Cassandra from binding to required network ports. Identify conflicting processes:

sudo netstat -tulpn | grep :9042
sudo netstat -tulpn | grep :7000

Stop conflicting services or modify Cassandra’s port configuration in cassandra.yaml.

Memory allocation problems occur on systems with insufficient RAM or incorrect JVM settings. Monitor memory usage:

free -h

Adjust heap settings in /etc/cassandra/jvm.options based on available memory.

Configuration file syntax errors prevent service startup. Validate YAML syntax:

python3 -c "import yaml; yaml.safe_load(open('/etc/cassandra/cassandra.yaml'))"

Connectivity and Performance Issues

Client connection failures often result from network binding or firewall configuration problems. Test local connectivity:

cqlsh localhost 9042

For remote connections, verify firewall rules and listen address configuration.

Performance degradation may indicate insufficient resources or suboptimal configuration. Monitor system resources:

htop
iotop

Analyze Cassandra logs for performance warnings:

sudo tail -f /var/log/cassandra/system.log

Security Best Practices

Access Control Implementation

Enable authentication and authorization for production deployments. Modify cassandra.yaml:

authenticator: org.apache.cassandra.auth.PasswordAuthenticator
authorizer: org.apache.cassandra.auth.CassandraAuthorizer

Restart Cassandra service after configuration changes:

sudo systemctl restart cassandra.service

Create administrative users through CQL:

CREATE ROLE admin WITH PASSWORD = 'secure_password' 
  AND LOGIN = true 
  AND SUPERUSER = true;

Role-based access control provides granular permissions management:

CREATE ROLE developer WITH PASSWORD = 'dev_password' 
  AND LOGIN = true;
  
GRANT SELECT ON KEYSPACE development TO developer;

System Hardening Measures

Configure file permissions for Cassandra directories:

sudo chmod 750 /var/lib/cassandra
sudo chmod 750 /var/log/cassandra
sudo chmod 640 /etc/cassandra/cassandra.yaml

Create a dedicated service user for Cassandra operations:

sudo useradd -r -s /bin/false -d /var/lib/cassandra cassandra
sudo chown -R cassandra:cassandra /var/lib/cassandra

Implement network security through proper firewall configuration:

sudo ufw default deny incoming
sudo ufw allow ssh
sudo ufw allow from trusted_ip_range to any port 9042
sudo ufw enable

SSL/TLS encryption secures client-server communication. Configure encryption in cassandra.yaml:

client_encryption_options:
  enabled: true
  optional: false
  keystore: /path/to/keystore
  keystore_password: keystore_password

Maintenance and Monitoring

Regular Maintenance Tasks

Implement log rotation to manage disk space usage:

sudo tee /etc/logrotate.d/cassandra << EOF
/var/log/cassandra/*.log {
    daily
    rotate 7
    compress
    delaycompress
    missingok
    notifempty
    copytruncate
}
EOF

Performance monitoring helps identify potential issues before they impact operations:

nodetool tablestats
nodetool tpstats

Backup strategies protect against data loss. Create keyspace snapshots:

nodetool snapshot test_keyspace

Schedule regular update procedures for security patches:

sudo pacman -Syu
yay -Sua  # For AUR packages

Monitoring Tools and Metrics

Built-in monitoring through nodetool provides comprehensive cluster insights:

nodetool status
nodetool ring
nodetool describecluster

System resource monitoring ensures optimal performance:

# CPU usage
top -p $(pgrep -f cassandra)

# Memory usage
cat /proc/$(pgrep -f cassandra)/status | grep -E "VmRSS|VmSize"

# Disk I/O
iotop -p $(pgrep -f cassandra)

Application-level monitoring tracks database-specific metrics:

nodetool cfstats
nodetool compactionstats
nodetool gcstats

Set up automated alerting for critical metrics:

# Example monitoring script
#!/bin/bash
LOAD=$(nodetool info | grep "Load" | awk '{print $3}')
if [[ $(echo "$LOAD > 100" | bc -l) -eq 1 ]]; then
    echo "High load detected: $LOAD" | mail -s "Cassandra Alert" admin@example.com
fi

Next Steps and Advanced Configuration

Scaling and Cluster Expansion

Multi-node cluster setup requires careful planning of network topology and replication strategies. Prepare additional nodes by installing Cassandra using the same method across all servers.

Data modeling optimization significantly impacts performance. Design tables around query patterns rather than relational database principles. Use appropriate partition keys to ensure even data distribution.

Performance tuning involves adjusting compaction strategies, cache settings, and memory allocation based on workload characteristics:

# Optimize for write-heavy workloads
compaction_strategy: SizeTieredCompactionStrategy
compaction_strategy_options:
  min_threshold: 4
  max_threshold: 32

Integration and Development Resources

Client driver installation enables application connectivity. Install drivers for your programming language:

# Python driver
pip install cassandra-driver

# Java driver (add to Maven pom.xml)
<dependency>
    <groupId>com.datastax.oss</groupId>
    <artifactId>java-driver-core</artifactId>
    <version>4.14.1</version>
</dependency>

Development best practices include connection pooling, prepared statements, and proper error handling. Utilize Cassandra’s strengths through denormalized data models and batch operations.

Community resources provide ongoing support and learning opportunities. The Apache Cassandra community offers documentation, forums, and professional training programs for continued skill development.

Professional support options include DataStax Enterprise for mission-critical deployments requiring commercial support, enhanced security features, and advanced analytics capabilities.

Congratulations! You have successfully installed Apache Cassandra. Thanks for using this tutorial for installing Apache Cassandra on your Manjaro Linux system. For additional help or useful information, we recommend you check the official Apache website.

VPS Manage Service Offer
If you don’t have time to do all of this stuff, or if this is not your area of expertise, we offer a service to do “VPS Manage Service Offer”, starting from $10 (Paypal payment). Please contact us to get the best deal!

r00t

r00t is an experienced Linux enthusiast and technical writer with a passion for open-source software. With years of hands-on experience in various Linux distributions, r00t has developed a deep understanding of the Linux ecosystem and its powerful tools. He holds certifications in SCE and has contributed to several open-source projects. r00t is dedicated to sharing her knowledge and expertise through well-researched and informative articles, helping others navigate the world of Linux with confidence.
Back to top button