How To Install Apache Spark on Rocky Linux 9
In this tutorial, we will show you how to install Apache Spark on Rocky Linux 9. For those of you who didn’t know, Apache Spark is a free and open-source cluster-computing framework used for analytics, machine learning, and graph processing on large volumes of data. One of the key features of Spark is its in-memory data processing capabilities. It uses a data structure called a Resilient Distributed Dataset (RDD) that allows it to store data in memory and perform operations on it quickly. Spark also supports SQL-like query languages, such as SQL and DataFrame API, which makes it easy for developers to perform complex data operations.
This article assumes you have at least basic knowledge of Linux, know how to use the shell, and most importantly, you host your site on your own VPS. The installation is quite simple and assumes you are running in the root account, if not you may need to add ‘sudo
‘ to the commands to get root privileges. I will show you the step-by-step installation of Apache Spark on Rocky Linux. 9.
Prerequisites
- A server running one of the following operating systems: Rocky Linux 9.
- It’s recommended that you use a fresh OS install to prevent any potential issues.
- SSH access to the server (or just open Terminal if you’re on a desktop).
- An active internet connection. You’ll need an internet connection to download the necessary packages and dependencies for Apache Spark.
- A
non-root sudo user
or access to theroot user
. We recommend acting as anon-root sudo user
, however, as you can harm your system if you’re not careful when acting as the root.
Install Apache Spark on Rocky Linux 9
Step 1. The first step is to update your system to the latest version of the package list. To do so, run the following commands:
sudo dnf check-update sudo dnf install dnf-utils
Step 2. Installing Java.
Apache Spark is written in Java, so we need to make sure that Java is installed on our Rocky Linux system. Now run the following command below to install Java:
sudo dnf install java-11-openjdk
Run the following command below to check whether Java is installed:
java -version
For additional resources on Java, read the post below:
Step 3. Installing Apache Spark on Rocky Linux 9.
By default, Spark is not available on the Rocky Linux 9 AppStream repository. Now run the following command to download the latest version of the Apache Spark package to your Rocky Linux system:
wget https://www.apache.org/dyn/closer.lua/spark/spark-3.3.1/spark-3.3.1-bin-hadoop3.tgz
Next, extract the downloaded file with the following command:
tar -xvf spark-3.3.1-bin-hadoop3.tgz
Move the extracted directory to the /opt
with the following command:
mv spark-3.3.1-bin-hadoop3.2 /opt/spark
Then, create a non-privileged user and set proper ownership:
useradd spark chown -R spark:spark /opt/spark
Step 4. Create Systemd Service Apache Spark.
Now we create a systemd
service file for Master using the following command:
nano /etc/systemd/system/spark-master.service
Add the following lines:
[Unit] Description=Apache Spark Master After=network.target [Service] Type=forking User=spark Group=spark ExecStart=/opt/spark/sbin/start-master.sh ExecStop=/opt/spark/sbin/stop-master.sh [Install] WantedBy=multi-user.target
Save and close the file, then create a slave service file:
nano /etc/systemd/system/spark-slave.service
Add the following lines:
[Unit] Description=Apache Spark Slave After=network.target [Service] Type=forking User=spark Group=spark ExecStart=/opt/spark/sbin/start-slave.sh spark://your-IP-server:7077 ExecStop=/opt/spark/sbin/stop-slave.sh [Install] WantedBy=multi-user.target
Save and close the file, then reload the systemd
daemon to apply the changes:
sudo systemctl daemon-reload sudo systemctl start spark-master sudo systemctl enable spark-master
Step 5. Configure Firewall.
Now we configure to allow the below ports through the firewall:
sudo firewall-cmd --zone=public --permanent --add-port=8080/tcp --permanent sudo firewall-cmd --reload
Step 6. Accessing Apache Spark Web Interface.
Once successfully installed, open your web browser and access Apache Spark using the URL http://your-IP-address:8080
. You will be redirected to the following page:
Congratulations! You have successfully installed Spark. Thanks for using this tutorial for installing Apache Spark on your Rocky Linux 9 system. For additional help or useful information, we recommend you check the official Apache website.