How To Install Apache Spark on openSUSE
In this tutorial, we will show you how to install Apache Spark on openSUSE. Apache Spark has become an essential tool for big data processing, offering lightning-fast performance and a wide range of features. As a data scientist, engineer, or enthusiast, you may find yourself needing to install Apache Spark on your openSUSE system.
This article assumes you have at least basic knowledge of Linux, know how to use the shell, and most importantly, you host your site on your own VPS. The installation is quite simple and assumes you are running in the root account, if not you may need to add ‘sudo
‘ to the commands to get root privileges. I will show you the step-by-step installation of Apache Spark on openSUSE.
Prerequisites
- A server running one of the following operating systems: openSUSE (Leap or Tumbleweed)
- It’s recommended that you use a fresh OS install to prevent any potential issues.
- You will need access to the terminal to execute commands. openSUSE provides the Terminal application for this purpose. It can be found in your Applications menu.
- You’ll need an active internet connection.
- You’ll need administrative (root) access or a user account with sudo privileges.
Install Apache Spark on openSUSE
Step 1. Update System Packages.
To ensure a smooth installation process, it’s always a good practice to update your system packages to their latest versions. Open a terminal and run the following commands:
sudo zypper refresh sudo zypper update
These commands will refresh the package repositories and update any outdated packages on your system.
Step 2. Installing Java Development Kit (JDK)
Apache Spark requires Java to run, so you’ll need to install the Java Development Kit (JDK) on your openSUSE system. To install JDK, run the following command in your terminal:
sudo zypper install java-11-openjdk
This command will install OpenJDK 11, which is a popular choice for Apache Spark. Once the installation is complete, you can verify the Java installation by running:
java -version
Step 3. Installing Scala.
Apache Spark is written in Scala, so you’ll need to install Scala on your openSUSE system. To install Scala, run the following command in your terminal:
sudo zypper install scala
This command will install the latest version of Scala available in the openSUSE repositories. After the installation is complete, you can verify the Scala installation by running:
scala -version
Step 4. Installing Apache Spark on openSUSE.
Now that you have the prerequisites installed, it’s time to download Apache Spark using the following command below:
wget https://www.apache.org/dyn/closer.lua/spark/spark-3.5.1/spark-3.5.1-bin-hadoop3.tgz
Once the download is complete, you need to extract the Apache Spark package. Use the following command to extract the downloaded tarball:
tar xvf spark-3.5.1-bin-hadoop3.tgz
This command will extract the contents of the package into a directory named spark-3.5.1-bin-hadoop3
. You can move this directory to a desired location, such as /opt/spark
, using the following command:
sudo mv spark-3.5.1-bin-hadoop3 /opt/spark
To make it easier to access Apache Spark from anywhere in your system, you need to set the necessary environment variables. Open the .bashrc file in your home directory using a text editor:
nano ~/.bashrc
Add the following lines at the end of the file:
export SPARK_HOME=/opt/spark export PATH=$PATH:$SPARK_HOME/bin
Save the changes and exit the text editor. To apply the changes, reload the .bashrc
file using the following command:
source ~/.bashrc
Now, you can access the Spark binaries from anywhere in your terminal.
Step 5. Installing PySpark (Optional)
If you plan to use Apache Spark with Python, you’ll need to install PySpark. PySpark is the Python API for Apache Spark, allowing you to write Spark applications using Python. To install PySpark, run the following command:
pip install pyspark
This command will install PySpark and its dependencies using the Python package manager, pip.
To verify that Apache Spark is installed correctly, you can start the Spark shell using the following command:
spark-shell
This command will launch the Spark shell, and you should see the Spark logo and version information displayed in the terminal. If you encounter any errors, double-check the installation steps and ensure that the environment variables are set correctly.
Step 6. Running a Simple Spark Application,
Now that you have Apache Spark installed and verified, let’s run a simple Spark application to count the number of lines in a text file. Create a new file named LineCount.scala
and add the following code:
val textFile = spark.read.textFile("README.md") val lineCount = textFile.count() println(s"Number of lines: $lineCount")
This code reads the README.md
file (assuming it exists in the current directory) and counts the number of lines in it. To run the application, use the following command:
spark-shell -i LineCount.scala
The Spark shell will execute the code, and you should see the output displaying the number of lines in the README.md
file.
Congratulations! You have successfully installed Apache Spark. Thanks for using this tutorial for installing Apache Spark on your openSUSE system. For additional or useful information, we recommend you check the official Apache website.