CentOSLinuxTutorials

How To Install Apache Spark on CentOS 7

Install Apache Spark on CentOS 7

In this tutorial, we will show you how to install Apache Spark on CentOS 7 server. For those of you who didn’t know, Apache Spark is a fast and general-purpose cluster computing system. It provides high-level APIs in Java, Scala, and Python, and also an optimized engine that supports overall execution charts. It also supports a rich set of higher-level tools including Spark SQL for SQL and structured information processing, MLlib for machine learning, GraphX for graph processing, and Spark Streaming.

This article assumes you have at least basic knowledge of Linux, know how to use the shell, and most importantly, you host your site on your own VPS. The installation is quite simple and assumes you are running in the root account, if not you may need to add ‘sudo‘ to the commands to get root privileges. I will show you the step-by-step installation of Apache Spark on the CentOS 7 server.

Prerequisites

  • A server running one of the following operating systems: CentOS 7.
  • It’s recommended that you use a fresh OS install to prevent any potential issues.
  • A non-root sudo user or access to the root user. We recommend acting as a non-root sudo user, however, as you can harm your system if you’re not careful when acting as the root.

Install Apache Spark on CentOS 7

Step 1. First, let’s start by ensuring your system is up-to-date.

yum clean all
yum -y install epel-release
yum -y update

Step 2. Installing Java.

Installing java for requirement install apache-spark:

yum install java -y

Once installed, check the java version:

java -version

Step 3. Installing Scala.

Spark installs Scala during the installation process, so we just need to make sure that Java and Python are present:

wget http://www.scala-lang.org/files/archive/scala-2.10.1.tgz
tar xvf scala-2.10.1.tgz
sudo mv scala-2.10.1 /usr/lib
sudo ln -s /usr/lib/scala-2.10.1 /usr/lib/scala
export PATH=$PATH:/usr/lib/scala/bin

Once installed, check the scala version:

scala -version

Step 4. Installing Apache Spark.

Install Apache Spark using the following command:

wget http://www-eu.apache.org/dist/spark/spark-2.2.1/spark-2.2.1-bin-hadoop2.7.tgz
tar -xzf spark-2.2.1-bin-hadoop2.7.tgz
export SPARK_HOME=$HOME/spark-2.2.1-bin-hadoop2.7
export PATH=$PATH:$SPARK_HOME/bin

Setup some Environment variables before you start spark:

echo 'export PATH=$PATH:/usr/lib/scala/bin' >> .bash_profile
echo 'export SPARK_HOME=$HOME/spark-2.2.1-bin-hadoop2.6' >> .bash_profile
echo 'export PATH=$PATH:$SPARK_HOME/bin' >> .bash_profile

The standalone Spark cluster can be started manually i.e. executing the start script on each node, or simply using the available launch scripts. For testing we can run master and slave daemons on the same machine:

./sbin/start-master.sh

Step 5. Configure Firewall for Apache Spark.

firewall-cmd --permanent --zone=public --add-port=6066/tcp
firewall-cmd --permanent --zone=public --add-port=7077/tcp
firewall-cmd --permanent --zone=public --add-port=8080-8081/tcp
firewall-cmd --reload

Step 6. Accessing Apache Spark.

Apache Spark will be available on HTTP port 7077 by default. Open your favorite browser and navigate to http://yourdomain.com:7077 or http://your-server-ip:7077 and complete the required steps to finish the installation.

Install Apache Spark on CentOS 7

Congratulations! You have successfully installed Apache Spark on CentOS 7. Thanks for using this tutorial for installing Apache Spark on CentOS 7 systems. For additional help or useful information, we recommend you check the official Apache Spark website.

VPS Manage Service Offer
If you don’t have time to do all of this stuff, or if this is not your area of expertise, we offer a service to do “VPS Manage Service Offer”, starting from $10 (Paypal payment). Please contact us to get the best deal!

r00t

r00t is a seasoned Linux system administrator with a wealth of experience in the field. Known for his contributions to idroot.us, r00t has authored numerous tutorials and guides, helping users navigate the complexities of Linux systems. His expertise spans across various Linux distributions, including Ubuntu, CentOS, and Debian. r00t's work is characterized by his ability to simplify complex concepts, making Linux more accessible to users of all skill levels. His dedication to the Linux community and his commitment to sharing knowledge makes him a respected figure in the field.
Back to top button