How To Install Apache Spark on Rocky Linux 9

Install Apache Spark on Rocky Linux 9

In this tutorial, we will show you how to install Apache Spark on Rocky Linux 9. For those of you who didn’t know, Apache Spark is a free and open-source cluster-computing framework used for analytics, machine learning, and graph processing on large volumes of data. One of the key features of Spark is its in-memory data processing capabilities. It uses a data structure called a Resilient Distributed Dataset (RDD) that allows it to store data in memory and perform operations on it quickly. Spark also supports SQL-like query languages, such as SQL and DataFrame API, which makes it easy for developers to perform complex data operations.

This article assumes you have at least basic knowledge of Linux, know how to use the shell, and most importantly, you host your site on your own VPS. The installation is quite simple and assumes you are running in the root account, if not you may need to add ‘sudo‘ to the commands to get root privileges. I will show you the step-by-step installation of Apache Spark on Rocky Linux. 9.

Prerequisites

  • A server running one of the following operating systems: Rocky Linux 9.
  • It’s recommended that you use a fresh OS install to prevent any potential issues.
  • SSH access to the server (or just open Terminal if you’re on a desktop).
  • An active internet connection. You’ll need an internet connection to download the necessary packages and dependencies for Apache Spark.
  • A non-root sudo useror access to the root user. We recommend acting as a non-root sudo user, however, as you can harm your system if you’re not careful when acting as the root.

Install Apache Spark on Rocky Linux 9

Step 1. The first step is to update your system to the latest version of the package list. To do so, run the following commands:

sudo dnf check-update
sudo dnf install dnf-utils

Step 2. Installing Java.

Apache Spark is written in Java, so we need to make sure that Java is installed on our Rocky Linux system. Now run the following command below to install Java:

sudo dnf install java-11-openjdk

Run the following command below to check whether Java is installed:

java -version

For additional resources on Java, read the post below:

Step 3. Installing Apache Spark on Rocky Linux 9.

By default, Spark is not available on the Rocky Linux 9 AppStream repository. Now run the following command to download the latest version of the Apache Spark package to your Rocky Linux system:

wget https://www.apache.org/dyn/closer.lua/spark/spark-3.3.1/spark-3.3.1-bin-hadoop3.tgz

Next, extract the downloaded file with the following command:

tar -xvf spark-3.3.1-bin-hadoop3.tgz

Move the extracted directory to the /opt with the following command:

mv spark-3.3.1-bin-hadoop3.2 /opt/spark

Then, create a non-privileged user and set proper ownership:

useradd spark
chown -R spark:spark /opt/spark

Step 4. Create Systemd Service Apache Spark.

Now we create a systemd service file for Master using the following command:

nano /etc/systemd/system/spark-master.service

Add the following lines:

[Unit]
Description=Apache Spark Master
After=network.target

[Service]
Type=forking
User=spark
Group=spark
ExecStart=/opt/spark/sbin/start-master.sh
ExecStop=/opt/spark/sbin/stop-master.sh

[Install]
WantedBy=multi-user.target

Save and close the file, then create a slave service file:

nano /etc/systemd/system/spark-slave.service

Add the following lines:

[Unit]
Description=Apache Spark Slave
After=network.target

[Service]
Type=forking
User=spark
Group=spark
ExecStart=/opt/spark/sbin/start-slave.sh spark://your-IP-server:7077
ExecStop=/opt/spark/sbin/stop-slave.sh

[Install]
WantedBy=multi-user.target

Save and close the file, then reload the systemd daemon to apply the changes:

sudo systemctl daemon-reload
sudo systemctl start spark-master
sudo systemctl enable spark-master

Step 5. Configure Firewall.

Now we configure to allow the below ports through the firewall:

sudo firewall-cmd --zone=public --permanent --add-port=8080/tcp --permanent
sudo firewall-cmd --reload

Step 6. Accessing Apache Spark Web Interface.

Once successfully installed, open your web browser and access Apache Spark using the URL http://your-IP-address:8080. You will be redirected to the following page:

Install Apache Spark on Rocky Linux 9

Congratulations! You have successfully installed Spark. Thanks for using this tutorial for installing Apache Spark on your Rocky Linux 9 system. For additional help or useful information, we recommend you check the official Apache website.

VPS Manage Service Offer
If you don’t have time to do all of this stuff, or if this is not your area of expertise, we offer a service to do “VPS Manage Service Offer”, starting from $10 (Paypal payment). Please contact us to get the best deal!