How To Install Apache Hadoop on Fedora 43

If you want to build a working big data environment on your local machine, setting up Apache Hadoop on Fedora 43 is one of the best ways to get hands-on experience with distributed computing without needing a multi-node server farm. Fedora 43 ships with a modern kernel, a fast DNF package manager, and solid OpenJDK support, which makes it an excellent host OS for Hadoop development and testing. In this guide, you will learn exactly how to install Apache Hadoop on Fedora 43 in pseudo-distributed mode — a single-node cluster where each Hadoop daemon runs in its own Java process. By the end, you will have a fully operational HDFS and YARN cluster running locally, and you will be able to submit your first MapReduce job.

Table of Contents

What Is Apache Hadoop and Why Does It Matter?

Apache Hadoop is an open-source distributed computing framework built for storing and processing massive datasets across clusters of commodity hardware. It was originally created at Yahoo, based on Google’s MapReduce and Google File System research papers, and is now maintained under the Apache Software Foundation.

Hadoop has two core pillars:

HDFS (Hadoop Distributed File System): splits large files into blocks (default 128 MB) and distributes them across nodes with built-in replication for fault tolerance
YARN (Yet Another Resource Negotiator): handles resource allocation and job scheduling across the cluster
MapReduce: the default batch processing engine that runs jobs in parallel across distributed data

Real-world use cases span financial analytics, healthcare data pipelines, e-commerce recommendation engines, and log analysis at scale. Even if you only run Hadoop locally for now, understanding its internals will pay dividends the moment you scale to a real cluster.

Prerequisites

Before you touch a single command, confirm your environment meets these requirements:

Hardware:

RAM: minimum 4 GB, recommended 8 GB
Disk space: at least 20 GB free
CPU: 2+ cores

Software and Access:

Fedora 43 installed (fresh install recommended)
A non-root user with sudo privileges
Active internet connection for downloading packages and the Hadoop binary
OpenJDK 11 or 17 (covered in Step 2)
OpenSSH server and client (covered in Step 4)
wget and tar utilities

Important note on SELinux: Fedora 43 enables SELinux by default. For local development, you can leave it in enforcing mode. If you run into unexpected permission denials later, you can temporarily set it to permissive with sudo setenforce 0 to isolate the issue.

Step 1: Update Fedora 43 and Install Base Utilities

Always start with a clean, fully updated system. This prevents version conflicts between packages you install later.

sudo dnf update -y

After the update finishes, install the utility tools you will use throughout this guide:

sudo dnf install -y wget tar curl

wget — downloads the Hadoop binary tarball from Apache mirrors
tar — extracts the downloaded archive
curl — useful for testing HTTP endpoints like the Hadoop Web UI

If the update pulled in a new kernel, reboot before moving on:

sudo reboot

This ensures you are running the latest kernel before you configure system-level services like SSH.

Step 2: Install Java (OpenJDK 11) on Fedora 43

Hadoop 3.4.x requires Java 8 or higher. OpenJDK 11 is stable, well-tested with Hadoop 3.x, and available directly from Fedora’s default repositories.

sudo dnf install -y java-11-openjdk java-11-openjdk-devel

Verify the installation worked:

java -version
javac -version

Expected output:

openjdk version "11.0.x" ...
OpenJDK Runtime Environment ...
OpenJDK 64-Bit Server VM ...

Now locate the exact JAVA_HOME path. You will need this in multiple configuration files later:

readlink -f /usr/bin/java

The output will look something like:

/usr/lib/jvm/java-11-openjdk-11.0.x.x-x.fc43.x86_64/bin/java

Strip /bin/java from the end. Your JAVA_HOME is:

/usr/lib/jvm/java-11-openjdk-11.0.x.x-x.fc43.x86_64

Write this path down. You will use it in ~/.bashrc and hadoop-env.sh. If you have multiple Java versions installed, run sudo alternatives --config java to select the correct one.

Step 3: Create a Dedicated Hadoop User

Running Hadoop as root is a security risk. Create a dedicated system user named hadoop to isolate all Hadoop processes and files.

sudo useradd -m -s /bin/bash hadoop
sudo passwd hadoop

Grant the hadoop user passwordless sudo access by adding an entry to the sudoers file:

sudo visudo

Add this line at the end of the file:

hadoop ALL=(ALL) NOPASSWD:ALL

Switch to the hadoop user for all remaining steps in this guide:

su - hadoop

Every Hadoop binary, configuration file, and data directory you create from this point forward should be owned by this user.

Step 4: Configure Passwordless SSH on Fedora 43

Hadoop daemons communicate with each other over SSH, even on a single-node setup. Without passwordless SSH configured, the start-dfs.sh and start-yarn.sh scripts will hang waiting for a password prompt that never gets answered.

Install and Enable SSH

sudo dnf install -y openssh-server
sudo systemctl enable --now sshd

Confirm the service is running:

sudo systemctl status sshd

You want to see Active: active (running) in the output.

Generate an SSH Key Pair

As the hadoop user, generate an RSA key pair with an empty passphrase:

ssh-keygen -t rsa -P "" -f ~/.ssh/id_rsa

Add the public key to the authorized_keys file:

cat ~/.ssh/id_rsa.pub >> ~/.ssh/authorized_keys
chmod 600 ~/.ssh/authorized_keys

Test Passwordless SSH

ssh localhost

Type yes when prompted about the RSA fingerprint, then type exit to return. If this step succeeds without asking for a password, you are ready to proceed.

If SSH fails, check that sshd is active with systemctl status sshd and confirm your authorized_keys file has chmod 600 permissions.

Step 5: Download and Install Apache Hadoop 3.4.x

Download the Official Binary

Always download Hadoop from the official Apache mirrors to avoid tampered or outdated releases.

cd /tmp
wget https://archive.apache.org/dist/hadoop/common/hadoop-3.4.1/hadoop-3.4.1.tar.gz

Extract and Move to /opt

sudo tar -xzf hadoop-3.4.1.tar.gz -C /opt/
sudo mv /opt/hadoop-3.4.1 /opt/hadoop
sudo chown -R hadoop:hadoop /opt/hadoop

This installs Hadoop to /opt/hadoop, which is the standard location for third-party software on Linux systems.

Key directories you will work with:

/opt/hadoop/bin/ — user-facing commands (hdfs, hadoop, yarn)
/opt/hadoop/sbin/ — admin scripts (start-dfs.sh, start-yarn.sh)
/opt/hadoop/etc/hadoop/ — all configuration files
/opt/hadoop/logs/ — daemon log files (check here when things break)

Step 6: Set Hadoop and Java Environment Variables

Environment variables let you run Hadoop commands from any directory and ensure Hadoop’s internal scripts can locate Java at runtime.

Open the hadoop user’s .bashrc file:

nano ~/.bashrc

Append the following block at the bottom of the file. Replace the JAVA_HOME value with the exact path you found in Step 2:

export JAVA_HOME=/usr/lib/jvm/java-11-openjdk-11.0.x.x-x.fc43.x86_64
export HADOOP_HOME=/opt/hadoop
export HADOOP_CONF_DIR=$HADOOP_HOME/etc/hadoop
export HADOOP_MAPRED_HOME=$HADOOP_HOME
export HADOOP_COMMON_HOME=$HADOOP_HOME
export HADOOP_HDFS_HOME=$HADOOP_HOME
export YARN_HOME=$HADOOP_HOME
export HADOOP_COMMON_LIB_NATIVE_DIR=$HADOOP_HOME/lib/native
export PATH=$PATH:$HADOOP_HOME/bin:$HADOOP_HOME/sbin

Apply the changes to your current session:

source ~/.bashrc

Verify Hadoop is reachable:

hadoop version

Expected output:

Hadoop 3.4.1
Source code repository ...
Compiled by ...

If you get command not found, double-check that PATH includes $HADOOP_HOME/bin and that you ran source ~/.bashrc.

Step 7: Configure the Hadoop Core Files

This is where the real work happens. You will edit five configuration files that define how Hadoop’s storage, resource management, and processing layers behave.

Configure hadoop-env.sh

This file sets the Java path for all Hadoop daemon startup scripts. Without it, daemons launched by start-dfs.sh can fail with a JAVA_HOME not set error because they do not always inherit your shell environment.

nano $HADOOP_HOME/etc/hadoop/hadoop-env.sh

Find the commented-out JAVA_HOME line and replace it with the hardcoded path:

export JAVA_HOME=/usr/lib/jvm/java-11-openjdk-11.0.x.x-x.fc43.x86_64

Save and close the file.

Configure core-site.xml

This file tells Hadoop where the NameNode is running. The NameNode is the master node that manages the HDFS filesystem namespace and metadata.

nano $HADOOP_HOME/etc/hadoop/core-site.xml

Replace the empty <configuration> block with:

<configuration>
  <property>
    <name>fs.defaultFS</name>
    <value>hdfs://localhost:9000</value>
  </property>
  <property>
    <name>hadoop.tmp.dir</name>
    <value>/home/hadoop/hadoopdata/tmp</value>
  </property>
</configuration>

fs.defaultFS sets the default filesystem URI. Any HDFS path you reference without a full URI will resolve against this address.

Configure hdfs-site.xml

This file defines HDFS replication and storage paths for the NameNode and DataNode.

nano $HADOOP_HOME/etc/hadoop/hdfs-site.xml

<configuration>
  <property>
    <name>dfs.replication</name>
    <value>1</value>
  </property>
  <property>
    <name>dfs.namenode.name.dir</name>
    <value>file:///home/hadoop/hadoopdata/hdfs/namenode</value>
  </property>
  <property>
    <name>dfs.datanode.data.dir</name>
    <value>file:///home/hadoop/hadoopdata/hdfs/datanode</value>
  </property>
</configuration>

dfs.replication=1 is correct for a single-node setup. On a real cluster with 3+ nodes, you would set this to 3 for fault tolerance.

Configure mapred-site.xml

This file tells MapReduce to use YARN for resource management instead of running jobs locally.

nano $HADOOP_HOME/etc/hadoop/mapred-site.xml

<configuration>
  <property>
    <name>mapreduce.framework.name</name>
    <value>yarn</value>
  </property>
  <property>
    <name>mapreduce.application.classpath</name>
    <value>$HADOOP_MAPRED_HOME/share/hadoop/mapreduce/*:$HADOOP_MAPRED_HOME/share/hadoop/mapreduce/lib/*</value>
  </property>
</configuration>

Without mapreduce.framework.name=yarn, Hadoop defaults to local mode and YARN is bypassed entirely.

Configure yarn-site.xml

This file enables the shuffle service that YARN’s NodeManager uses to serve intermediate map output to reducers.

nano $HADOOP_HOME/etc/hadoop/yarn-site.xml

<configuration>
  <property>
    <name>yarn.nodemanager.aux-services</name>
    <value>mapreduce_shuffle</value>
  </property>
  <property>
    <name>yarn.nodemanager.env-whitelist</name>
    <value>JAVA_HOME,HADOOP_COMMON_HOME,HADOOP_HDFS_HOME,HADOOP_CONF_DIR,CLASSPATH_PREPEND_DISTCACHE,HADOOP_YARN_HOME,HADOOP_HOME,PATH,LANG,TZ,HADOOP_MAPRED_HOME</value>
  </property>
</configuration>

The env-whitelist property ensures YARN passes the critical environment variables to containerized jobs.

Step 8: Create HDFS Directories and Format the NameNode

Create the Storage Directories

The paths you defined in hdfs-site.xml need to physically exist before you format the NameNode. Skipping this step is one of the most common beginner mistakes.

mkdir -p ~/hadoopdata/hdfs/{namenode,datanode}
mkdir -p ~/hadoopdata/tmp

Format the NameNode

Format the HDFS filesystem to initialize the NameNode metadata store:

hdfs namenode -format

Look for this line in the output to confirm success:

Storage directory /home/hadoop/hadoopdata/hdfs/namenode has been successfully formatted.

Critical warning: Only format the NameNode once on a cluster. If you format it again after data exists, you create a new cluster ID. The DataNode will refuse to connect because its stored cluster ID no longer matches the NameNode, and you will lose all HDFS data.

Step 9: Start Hadoop Services and Verify the Cluster

Start HDFS and YARN

Start the HDFS daemons first (NameNode, DataNode, SecondaryNameNode):

start-dfs.sh

Then start the YARN daemons (ResourceManager, NodeManager):

start-yarn.sh

Verify with JPS

The jps command lists all running Java processes. On a healthy single-node Hadoop cluster, you need to see all six of these:

jps

Expected output:

XXXX NameNode
XXXX DataNode
XXXX SecondaryNameNode
XXXX ResourceManager
XXXX NodeManager
XXXX Jps

If any daemon is missing, open its log file under $HADOOP_HOME/logs/ and look for ERROR or FATAL lines.

Access the Web Interfaces

Open a browser on your Fedora 43 machine and visit:

HDFS NameNode UI: http://localhost:9870 — shows filesystem health, block reports, live DataNodes
YARN ResourceManager UI: http://localhost:8088 — shows cluster resources, running and completed jobs

If either URL does not load, open the ports through Fedora’s firewall:

sudo firewall-cmd --add-port=9870/tcp --permanent
sudo firewall-cmd --add-port=8088/tcp --permanent
sudo firewall-cmd --reload

Step 10: Run a MapReduce Word Count Job to Validate the Setup

A working Web UI proves the daemons are alive. A successful MapReduce job proves the entire pipeline — HDFS read/write, YARN scheduling, and MapReduce execution — works end to end.

Create an HDFS input directory:

hdfs dfs -mkdir -p /user/hadoop/input

Create a test file and upload it to HDFS:

echo "Apache Hadoop on Fedora 43 Linux big data setup configure" > testfile.txt
hdfs dfs -put testfile.txt /user/hadoop/input/

Run the built-in WordCount example that ships with Hadoop:

hadoop jar $HADOOP_HOME/share/hadoop/mapreduce/hadoop-mapreduce-examples-*.jar wordcount /user/hadoop/input /user/hadoop/output

View the results:

hdfs dfs -cat /user/hadoop/output/part-r-00000

Expected output:

43      1
Apache  1
Fedora  1
Hadoop  1
Linux   1
big     1
configure 1
data    1
on      1
setup   1

If you see word counts in the output, your Apache Hadoop on Fedora 43 setup is fully operational.

Troubleshooting Common Hadoop Installation Errors on Fedora 43

Even with a correct setup, a few issues come up regularly. Here are the most frequent ones and exactly how to fix them.

Error: JAVA_HOME is not set when starting daemons

Cause: JAVA_HOME was added to ~/.bashrc but not to hadoop-env.sh
Fix: Open $HADOOP_HOME/etc/hadoop/hadoop-env.sh and hardcode the full JAVA_HOME path. The daemon startup scripts do not inherit shell variables from ~/.bashrc.

Error: ssh: connect to host localhost port 22: Connection refused

Cause: The sshd service is not running
Fix: Run sudo systemctl start sshd and sudo systemctl enable sshd to start it and persist it across reboots.

Error: DataNode fails to start (cluster ID mismatch)

Cause: The NameNode was formatted more than once after the DataNode had already written data
Fix: Delete the DataNode data directory contents and restart: rm -rf ~/hadoopdata/hdfs/datanode/* && start-dfs.sh. Do not format the NameNode again — just restart the services.

Error: hdfs namenode -format fails with hostname resolution error

Cause: The system hostname is not resolvable to 127.0.0.1 in /etc/hosts
Fix: Open /etc/hosts and confirm you have this line: 127.0.0.1 localhost. Add your machine’s short hostname on the same line if needed.

Error: Port 9870 or 8088 not accessible in browser

Cause: Fedora’s firewalld is blocking the ports
Fix: Use the firewall-cmd commands shown in Step 9 to open ports 9870 and 8088, then reload the firewall rules.

Congratulations! You have successfully installed Apache Hadoop. Thanks for using this tutorial to install the latest version of Apache Hadoop on Fedora 43 Linux system. For additional help or useful information, we recommend you check the official Hadoop website.

VPS Manage Service Offer

If you don’t have time to do all of this stuff, or if this is not your area of expertise, we offer a service to do “VPS Manage Service Offer”, starting from $10 (Paypal payment). Please contact us to get the best deal!

What Is Apache Hadoop and Why Does It Matter?

Prerequisites

Step 1: Update Fedora 43 and Install Base Utilities

Step 2: Install Java (OpenJDK 11) on Fedora 43

Step 3: Create a Dedicated Hadoop User

Step 4: Configure Passwordless SSH on Fedora 43

Install and Enable SSH

Generate an SSH Key Pair

Test Passwordless SSH

Step 5: Download and Install Apache Hadoop 3.4.x

Download the Official Binary

Extract and Move to /opt

Step 6: Set Hadoop and Java Environment Variables

Step 7: Configure the Hadoop Core Files

Configure hadoop-env.sh

Configure core-site.xml

Configure hdfs-site.xml

Configure mapred-site.xml

Configure yarn-site.xml

Step 8: Create HDFS Directories and Format the NameNode

Create the Storage Directories

Format the NameNode

Step 9: Start Hadoop Services and Verify the Cluster

Start HDFS and YARN

Verify with JPS

Access the Web Interfaces

Step 10: Run a MapReduce Word Count Job to Validate the Setup

Troubleshooting Common Hadoop Installation Errors on Fedora 43

Related Posts

How To Install OpenLDAP on AlmaLinux 10

How To Install FreeIPA on AlmaLinux 10

How To Install Ventoy on Fedora 43

How To Install RawTherapee on Fedora 43

How To Install Gemini CLI on AlmaLinux 10

How To Install Prometheus on Fedora 43