UbuntuUbuntu Based

How To Install Apache Spark on Ubuntu 22.04 LTS

Install Apache Spark on Ubuntu 22.04

In this tutorial, we will show you how to install Apache Spark on Ubuntu 22.04 LTS. For those of you who didn’t know, Apache Spark is a powerful open-source distributed computing framework used for large-scale data processing, machine learning, and real-time analytics. It offers a user-friendly interface for working with big data and provides significant performance improvements over traditional data processing systems.

This article assumes you have at least basic knowledge of Linux, know how to use the shell, and most importantly, you host your site on your own VPS. The installation is quite simple and assumes you are running in the root account, if not you may need to add ‘sudo‘ to the commands to get root privileges. I will show you the step-by-step installation of Apache Spark on Ubuntu 22.04. You can follow the same instructions for Ubuntu 22.04 and any other Debian-based distribution like Linux Mint, Elementary OS, Pop!_OS, and more as well.

Prerequisites

  • A server running one of the following operating systems: Ubuntu 22.04, 20.04, and any other Debian-based distribution like Linux Mint.
  • It’s recommended that you use a fresh OS install to prevent any potential issues.
  • SSH access to the server (or just open Terminal if you’re on a desktop).
  • An active internet connection. You’ll need an internet connection to download the necessary packages and dependencies for Apache Spark.
  • A non-root sudo user or access to the root user. We recommend acting as a non-root sudo user, however, as you can harm your system if you’re not careful when acting as the root.

Install Apache Spark on Ubuntu 22.04 LTS Jammy Jellyfish

Step 1. Start by updating your system packages to ensure you have the latest versions installed. Open the terminal and run the following command:

sudo apt update
sudo apt upgrade
sudo apt install wget apt-transport-https gnupg2 software-properties-common

This command will fetch the latest package information from the Ubuntu repositories.

Step 2. Installing Java.

Apache Spark requires Java 8 or higher to be installed on the system. If you don’t have Java installed on your system, you can install it by running the following command:

sudo apt install default-jdk

Verify the Java version by running the following command:

java -version

For additional resources on installing Java, read the post below:

Step 3. Installing Apache Spark on Ubuntu 22.04.

By default, Apache Spark is not available on Ubuntu 22.04 base repository. Now run the following command below to download the latest version of Apache Spark from the official Apache Spark website to your Ubuntu systems:

wget https://dlcdn.apache.org/spark/spark-3.5.1/spark-3.5.1-bin-hadoop3.tgz

Next, extract the package using the following command:

tar -xvzf spark-3.5.1-bin-hadoop3.tgz

Move the extracted package to the /usr/local directory using the following command:

sudo mv spark-3.5.1-bin-hadoop3 /usr/local/spark

Step 4. Configure Apache Spark.

You need to configure Apache Spark by setting some environment variables. now open the .bashrc file using the following command:

nano ~/.bashrc

Add the following lines:

export SPARK_HOME=/usr/local/spark
export PATH=$PATH:$SPARK_HOME/bin:$SPARK_HOME/sbin

Save and close the file, then reload the .bashrc file by running the following command:

source ~/.bashrc

Next, copy the default configuration file by running the following command:

cp /usr/local/spark/conf/spark-env.sh.template /usr/local/spark/conf/spark-env.sh

After that, we open the spark-env.sh file using the following command:

nano /usr/local/spark/conf/spark-env.sh

Add the following file:

export JAVA_HOME=/usr/lib/jvm/java-11-openjdk-amd64

Step 5. Testing Apache Spark.

Now that we have installed and configured Apache Spark, let’s test the installation by running a simple Spark application. We will use the Spark shell to test the installation. Enter the following command to start the Spark shell:

spark-shell

This command will open the Spark shell, and you should see the Spark logo and a prompt that looks like this:

21/02/22 36:46:11 INFO SparkContext: Running Spark version 3.5.1
Welcome to
     ____              __
    / __/__  ___ _____/ /__
   _\ \/ _ \/ _ `/ __/  '_/
  /___/ .__/\_,_/_/ /_/\_\   version 3.5.1
     /_/

Congratulations! You have successfully installed Apache Spark. Thanks for using this tutorial for installing Apache Spark on the Ubuntu system. For additional help or useful information, we recommend you check the official Apache Spark website.

VPS Manage Service Offer
If you don’t have time to do all of this stuff, or if this is not your area of expertise, we offer a service to do “VPS Manage Service Offer”, starting from $10 (Paypal payment). Please contact us to get the best deal!

r00t

r00t is an experienced Linux enthusiast and technical writer with a passion for open-source software. With years of hands-on experience in various Linux distributions, r00t has developed a deep understanding of the Linux ecosystem and its powerful tools. He holds certifications in SCE and has contributed to several open-source projects. r00t is dedicated to sharing her knowledge and expertise through well-researched and informative articles, helping others navigate the world of Linux with confidence.
Back to top button