How To Install Apache Spark on Ubuntu 18.04 LTS

2 minutes read

In this tutorial, we will show you how to install Apache Spark on Ubuntu 18.04 LTS. For those of you who didn’t know, Apache Spark is a fast and general-purpose cluster computing system. It provides high-level APIs in Java, Scala, and Python, and also an optimized engine that supports overall execution charts. It also supports a rich set of higher-level tools including Spark SQL for SQL and structured information processing, MLlib for machine learning, GraphX for graph processing, and Spark Streaming.

This article assumes you have at least basic knowledge of Linux, know how to use the shell, and most importantly, you host your site on your own VPS. The installation is quite simple and assumes you are running in the root account, if not you may need to add ‘sudo‘ to the commands to get root privileges. I will show you the step-by-step installation of Apache Spark on an Ubuntu 18.04 LTS (Bionic Beaver) server.

Prerequisites

A server running one of the following operating systems: Ubuntu 18.04 LTS (Bionic Beaver).
It’s recommended that you use a fresh OS install to prevent any potential issues.
A non-root sudo user or access to the root user. We recommend acting as a non-root sudo user, however, as you can harm your system if you’re not careful when acting as the root.

Install Apache Spark on Ubuntu 18.04 LTS Bionic Beaver

Step 1. First, make sure that all your system packages are up-to-date by running the following apt-get commands in the terminal.

sudo apt-get update
sudo apt-get upgrade

Step 2. Installing Java.

Apache Spark requires Java to be installed on your server. By default, Java is not available in Ubuntu’s repository. Add the Oracle Java PPA to Apt with the following command:

add-apt-repository ppa:webupd8team/java
apt-get update -y
apt-get install oracle-java8-installer

Verify the Java version by running the following command:

java -version

Step 3. Installing Apache Spark on Ubuntu 18.04 LTS.

Install Apache Spark using the following command:

wget https://www.apache.org/dyn/closer.lua/spark/spark-2.3.1/spark-2.3.1-bin-hadoop2.7.tgz
tar xvzf spark-2.3.1-bin-hadoop2.7.tgz
ln -s spark-2.3.1-bin-hadoop2.7 spark

Adding Spark to Path:

nano ~/.bashrc

Next, add these lines to the end of the .bashrc file so that path can contain the Spark executable file path:

SPARK_HOME=/idr00t/spark
export PATH=$SPARK_HOME/bin:$PATH

To activate these changes, run the following command for bashrc file:

source ~/.bashrc

Launching Spark Shell:

./spark/bin/spark-shell

Step 4. Accessing Apache Spark.

Apache Spark will be available on HTTP port 4040 by default. Open your favorite browser and navigate to http://your-domain.com:4040 or http://server-ip:40404 and complete the required steps to finish the installation.

Install Apache Spark on Ubuntu 18.04 LTS

Congratulations! You have successfully installed Apache Spark. Thanks for using this tutorial for installing Apache Spark on Ubuntu 18.04 LTS (Bionic Beaver) system. For additional help or useful information, we recommend you to check the official Apache Spark website.

VPS Manage Service Offer

If you don’t have time to do all of this stuff, or if this is not your area of expertise, we offer a service to do “VPS Manage Service Offer”, starting from $10 (Paypal payment). Please contact us to get the best deal!