
If you want to build a working big data environment on your local machine, setting up Apache Hadoop on Fedora 43 is one of the best ways to get hands-on experience with distributed computing without needing a multi-node server farm. Fedora 43 ships with a modern kernel, a fast DNF package manager, and solid OpenJDK support, which makes it an excellent host OS for Hadoop development and testing. In this guide, you will learn exactly how to install Apache Hadoop on Fedora 43 in pseudo-distributed mode — a single-node cluster where each Hadoop daemon runs in its own Java process. By the end, you will have a fully operational HDFS and YARN cluster running locally, and you will be able to submit your first MapReduce job.
What Is Apache Hadoop and Why Does It Matter?
Apache Hadoop is an open-source distributed computing framework built for storing and processing massive datasets across clusters of commodity hardware. It was originally created at Yahoo, based on Google’s MapReduce and Google File System research papers, and is now maintained under the Apache Software Foundation.
Hadoop has two core pillars:
- HDFS (Hadoop Distributed File System): splits large files into blocks (default 128 MB) and distributes them across nodes with built-in replication for fault tolerance
- YARN (Yet Another Resource Negotiator): handles resource allocation and job scheduling across the cluster
- MapReduce: the default batch processing engine that runs jobs in parallel across distributed data
Real-world use cases span financial analytics, healthcare data pipelines, e-commerce recommendation engines, and log analysis at scale. Even if you only run Hadoop locally for now, understanding its internals will pay dividends the moment you scale to a real cluster.
Prerequisites
Before you touch a single command, confirm your environment meets these requirements:
Hardware:
- RAM: minimum 4 GB, recommended 8 GB
- Disk space: at least 20 GB free
- CPU: 2+ cores
Software and Access:
- Fedora 43 installed (fresh install recommended)
- A non-root user with
sudoprivileges - Active internet connection for downloading packages and the Hadoop binary
- OpenJDK 11 or 17 (covered in Step 2)
- OpenSSH server and client (covered in Step 4)
wgetandtarutilities
Important note on SELinux: Fedora 43 enables SELinux by default. For local development, you can leave it in enforcing mode. If you run into unexpected permission denials later, you can temporarily set it to permissive with sudo setenforce 0 to isolate the issue.
Step 1: Update Fedora 43 and Install Base Utilities
Always start with a clean, fully updated system. This prevents version conflicts between packages you install later.
sudo dnf update -y
After the update finishes, install the utility tools you will use throughout this guide:
sudo dnf install -y wget tar curl
wget— downloads the Hadoop binary tarball from Apache mirrorstar— extracts the downloaded archivecurl— useful for testing HTTP endpoints like the Hadoop Web UI
If the update pulled in a new kernel, reboot before moving on:
sudo reboot
This ensures you are running the latest kernel before you configure system-level services like SSH.
Step 2: Install Java (OpenJDK 11) on Fedora 43
Hadoop 3.4.x requires Java 8 or higher. OpenJDK 11 is stable, well-tested with Hadoop 3.x, and available directly from Fedora’s default repositories.
sudo dnf install -y java-11-openjdk java-11-openjdk-devel
Verify the installation worked:
java -version
javac -version
Expected output:
openjdk version "11.0.x" ...
OpenJDK Runtime Environment ...
OpenJDK 64-Bit Server VM ...
Now locate the exact JAVA_HOME path. You will need this in multiple configuration files later:
readlink -f /usr/bin/java
The output will look something like:
/usr/lib/jvm/java-11-openjdk-11.0.x.x-x.fc43.x86_64/bin/java
Strip /bin/java from the end. Your JAVA_HOME is:
/usr/lib/jvm/java-11-openjdk-11.0.x.x-x.fc43.x86_64
Write this path down. You will use it in ~/.bashrc and hadoop-env.sh. If you have multiple Java versions installed, run sudo alternatives --config java to select the correct one.
Step 3: Create a Dedicated Hadoop User
Running Hadoop as root is a security risk. Create a dedicated system user named hadoop to isolate all Hadoop processes and files.
sudo useradd -m -s /bin/bash hadoop
sudo passwd hadoop
Grant the hadoop user passwordless sudo access by adding an entry to the sudoers file:
sudo visudo
Add this line at the end of the file:
hadoop ALL=(ALL) NOPASSWD:ALL
Switch to the hadoop user for all remaining steps in this guide:
su - hadoop
Every Hadoop binary, configuration file, and data directory you create from this point forward should be owned by this user.
Step 4: Configure Passwordless SSH on Fedora 43
Hadoop daemons communicate with each other over SSH, even on a single-node setup. Without passwordless SSH configured, the start-dfs.sh and start-yarn.sh scripts will hang waiting for a password prompt that never gets answered.
Install and Enable SSH
sudo dnf install -y openssh-server
sudo systemctl enable --now sshd
Confirm the service is running:
sudo systemctl status sshd
You want to see Active: active (running) in the output.
Generate an SSH Key Pair
As the hadoop user, generate an RSA key pair with an empty passphrase:
ssh-keygen -t rsa -P "" -f ~/.ssh/id_rsa
Add the public key to the authorized_keys file:
cat ~/.ssh/id_rsa.pub >> ~/.ssh/authorized_keys
chmod 600 ~/.ssh/authorized_keys
Test Passwordless SSH
ssh localhost
Type yes when prompted about the RSA fingerprint, then type exit to return. If this step succeeds without asking for a password, you are ready to proceed.
If SSH fails, check that sshd is active with systemctl status sshd and confirm your authorized_keys file has chmod 600 permissions.
Step 5: Download and Install Apache Hadoop 3.4.x
Download the Official Binary
Always download Hadoop from the official Apache mirrors to avoid tampered or outdated releases.
cd /tmp
wget https://archive.apache.org/dist/hadoop/common/hadoop-3.4.1/hadoop-3.4.1.tar.gz
Extract and Move to /opt
sudo tar -xzf hadoop-3.4.1.tar.gz -C /opt/
sudo mv /opt/hadoop-3.4.1 /opt/hadoop
sudo chown -R hadoop:hadoop /opt/hadoop
This installs Hadoop to /opt/hadoop, which is the standard location for third-party software on Linux systems.
Key directories you will work with:
/opt/hadoop/bin/— user-facing commands (hdfs,hadoop,yarn)/opt/hadoop/sbin/— admin scripts (start-dfs.sh,start-yarn.sh)/opt/hadoop/etc/hadoop/— all configuration files/opt/hadoop/logs/— daemon log files (check here when things break)
Step 6: Set Hadoop and Java Environment Variables
Environment variables let you run Hadoop commands from any directory and ensure Hadoop’s internal scripts can locate Java at runtime.
Open the hadoop user’s .bashrc file:
nano ~/.bashrc
Append the following block at the bottom of the file. Replace the JAVA_HOME value with the exact path you found in Step 2:
export JAVA_HOME=/usr/lib/jvm/java-11-openjdk-11.0.x.x-x.fc43.x86_64
export HADOOP_HOME=/opt/hadoop
export HADOOP_CONF_DIR=$HADOOP_HOME/etc/hadoop
export HADOOP_MAPRED_HOME=$HADOOP_HOME
export HADOOP_COMMON_HOME=$HADOOP_HOME
export HADOOP_HDFS_HOME=$HADOOP_HOME
export YARN_HOME=$HADOOP_HOME
export HADOOP_COMMON_LIB_NATIVE_DIR=$HADOOP_HOME/lib/native
export PATH=$PATH:$HADOOP_HOME/bin:$HADOOP_HOME/sbin
Apply the changes to your current session:
source ~/.bashrc
Verify Hadoop is reachable:
hadoop version
Expected output:
Hadoop 3.4.1
Source code repository ...
Compiled by ...
If you get command not found, double-check that PATH includes $HADOOP_HOME/bin and that you ran source ~/.bashrc.
Step 7: Configure the Hadoop Core Files
This is where the real work happens. You will edit five configuration files that define how Hadoop’s storage, resource management, and processing layers behave.
Configure hadoop-env.sh
This file sets the Java path for all Hadoop daemon startup scripts. Without it, daemons launched by start-dfs.sh can fail with a JAVA_HOME not set error because they do not always inherit your shell environment.
nano $HADOOP_HOME/etc/hadoop/hadoop-env.sh
Find the commented-out JAVA_HOME line and replace it with the hardcoded path:
export JAVA_HOME=/usr/lib/jvm/java-11-openjdk-11.0.x.x-x.fc43.x86_64
Save and close the file.
Configure core-site.xml
This file tells Hadoop where the NameNode is running. The NameNode is the master node that manages the HDFS filesystem namespace and metadata.
nano $HADOOP_HOME/etc/hadoop/core-site.xml
Replace the empty <configuration> block with:
<configuration>
<property>
<name>fs.defaultFS</name>
<value>hdfs://localhost:9000</value>
</property>
<property>
<name>hadoop.tmp.dir</name>
<value>/home/hadoop/hadoopdata/tmp</value>
</property>
</configuration>
fs.defaultFS sets the default filesystem URI. Any HDFS path you reference without a full URI will resolve against this address.
Configure hdfs-site.xml
This file defines HDFS replication and storage paths for the NameNode and DataNode.
nano $HADOOP_HOME/etc/hadoop/hdfs-site.xml
<configuration>
<property>
<name>dfs.replication</name>
<value>1</value>
</property>
<property>
<name>dfs.namenode.name.dir</name>
<value>file:///home/hadoop/hadoopdata/hdfs/namenode</value>
</property>
<property>
<name>dfs.datanode.data.dir</name>
<value>file:///home/hadoop/hadoopdata/hdfs/datanode</value>
</property>
</configuration>
dfs.replication=1 is correct for a single-node setup. On a real cluster with 3+ nodes, you would set this to 3 for fault tolerance.
Configure mapred-site.xml
This file tells MapReduce to use YARN for resource management instead of running jobs locally.
nano $HADOOP_HOME/etc/hadoop/mapred-site.xml
<configuration>
<property>
<name>mapreduce.framework.name</name>
<value>yarn</value>
</property>
<property>
<name>mapreduce.application.classpath</name>
<value>$HADOOP_MAPRED_HOME/share/hadoop/mapreduce/*:$HADOOP_MAPRED_HOME/share/hadoop/mapreduce/lib/*</value>
</property>
</configuration>
Without mapreduce.framework.name=yarn, Hadoop defaults to local mode and YARN is bypassed entirely.
Configure yarn-site.xml
This file enables the shuffle service that YARN’s NodeManager uses to serve intermediate map output to reducers.
nano $HADOOP_HOME/etc/hadoop/yarn-site.xml
<configuration>
<property>
<name>yarn.nodemanager.aux-services</name>
<value>mapreduce_shuffle</value>
</property>
<property>
<name>yarn.nodemanager.env-whitelist</name>
<value>JAVA_HOME,HADOOP_COMMON_HOME,HADOOP_HDFS_HOME,HADOOP_CONF_DIR,CLASSPATH_PREPEND_DISTCACHE,HADOOP_YARN_HOME,HADOOP_HOME,PATH,LANG,TZ,HADOOP_MAPRED_HOME</value>
</property>
</configuration>
The env-whitelist property ensures YARN passes the critical environment variables to containerized jobs.
Step 8: Create HDFS Directories and Format the NameNode
Create the Storage Directories
The paths you defined in hdfs-site.xml need to physically exist before you format the NameNode. Skipping this step is one of the most common beginner mistakes.
mkdir -p ~/hadoopdata/hdfs/{namenode,datanode}
mkdir -p ~/hadoopdata/tmp
Format the NameNode
Format the HDFS filesystem to initialize the NameNode metadata store:
hdfs namenode -format
Look for this line in the output to confirm success:
Storage directory /home/hadoop/hadoopdata/hdfs/namenode has been successfully formatted.
Critical warning: Only format the NameNode once on a cluster. If you format it again after data exists, you create a new cluster ID. The DataNode will refuse to connect because its stored cluster ID no longer matches the NameNode, and you will lose all HDFS data.
Step 9: Start Hadoop Services and Verify the Cluster
Start HDFS and YARN
Start the HDFS daemons first (NameNode, DataNode, SecondaryNameNode):
start-dfs.sh
Then start the YARN daemons (ResourceManager, NodeManager):
start-yarn.sh
Verify with JPS
The jps command lists all running Java processes. On a healthy single-node Hadoop cluster, you need to see all six of these:
jps
Expected output:
XXXX NameNode
XXXX DataNode
XXXX SecondaryNameNode
XXXX ResourceManager
XXXX NodeManager
XXXX Jps
If any daemon is missing, open its log file under $HADOOP_HOME/logs/ and look for ERROR or FATAL lines.
Access the Web Interfaces
Open a browser on your Fedora 43 machine and visit:
- HDFS NameNode UI:
http://localhost:9870— shows filesystem health, block reports, live DataNodes - YARN ResourceManager UI:
http://localhost:8088— shows cluster resources, running and completed jobs
If either URL does not load, open the ports through Fedora’s firewall:
sudo firewall-cmd --add-port=9870/tcp --permanent
sudo firewall-cmd --add-port=8088/tcp --permanent
sudo firewall-cmd --reload
Step 10: Run a MapReduce Word Count Job to Validate the Setup
A working Web UI proves the daemons are alive. A successful MapReduce job proves the entire pipeline — HDFS read/write, YARN scheduling, and MapReduce execution — works end to end.
Create an HDFS input directory:
hdfs dfs -mkdir -p /user/hadoop/input
Create a test file and upload it to HDFS:
echo "Apache Hadoop on Fedora 43 Linux big data setup configure" > testfile.txt
hdfs dfs -put testfile.txt /user/hadoop/input/
Run the built-in WordCount example that ships with Hadoop:
hadoop jar $HADOOP_HOME/share/hadoop/mapreduce/hadoop-mapreduce-examples-*.jar wordcount /user/hadoop/input /user/hadoop/output
View the results:
hdfs dfs -cat /user/hadoop/output/part-r-00000
Expected output:
43 1
Apache 1
Fedora 1
Hadoop 1
Linux 1
big 1
configure 1
data 1
on 1
setup 1
If you see word counts in the output, your Apache Hadoop on Fedora 43 setup is fully operational.
Troubleshooting Common Hadoop Installation Errors on Fedora 43
Even with a correct setup, a few issues come up regularly. Here are the most frequent ones and exactly how to fix them.
Error: JAVA_HOME is not set when starting daemons
- Cause:
JAVA_HOMEwas added to~/.bashrcbut not tohadoop-env.sh - Fix: Open
$HADOOP_HOME/etc/hadoop/hadoop-env.shand hardcode the fullJAVA_HOMEpath. The daemon startup scripts do not inherit shell variables from~/.bashrc.
Error: ssh: connect to host localhost port 22: Connection refused
- Cause: The
sshdservice is not running - Fix: Run
sudo systemctl start sshdandsudo systemctl enable sshdto start it and persist it across reboots.
Error: DataNode fails to start (cluster ID mismatch)
- Cause: The NameNode was formatted more than once after the DataNode had already written data
- Fix: Delete the DataNode data directory contents and restart:
rm -rf ~/hadoopdata/hdfs/datanode/* && start-dfs.sh. Do not format the NameNode again — just restart the services.
Error: hdfs namenode -format fails with hostname resolution error
- Cause: The system hostname is not resolvable to
127.0.0.1in/etc/hosts - Fix: Open
/etc/hostsand confirm you have this line:127.0.0.1 localhost. Add your machine’s short hostname on the same line if needed.
Error: Port 9870 or 8088 not accessible in browser
- Cause: Fedora’s
firewalldis blocking the ports - Fix: Use the
firewall-cmdcommands shown in Step 9 to open ports 9870 and 8088, then reload the firewall rules.
Congratulations! You have successfully installed Apache Hadoop. Thanks for using this tutorial to install the latest version of Apache Hadoop on Fedora 43 Linux system. For additional help or useful information, we recommend you check the official Hadoop website.