Knowledge Base
Tutorials
Tutorials
  • How to install Joomla 4.2.2 on Ubuntu 20.04?
  • How to set and execute environment and shell variables in Linux?
  • How to employ Docker containers?
  • How to access PostgreSQL Image?
  • How to add and delete users on Ubuntu 20.04?
  • How to add swap space on Ubuntu 20.04?
  • How to back up, restore, and migrate a MongoDB database on Ubuntu 20.04?
  • How to configure additional SSH keys to your public cloud instance?
  • How to configure static IP on Ubuntu?
  • How to create a CLI with Python Fire on Ubuntu 22.04?
  • How to create a new user and manage permissions in MySQL?
  • How to create an instance with Terraform?
  • How to import and export a MongoDB database on Ubuntu 20.04?
  • How to install 7zip on Ubuntu 20.04?
  • How to install AIDE on Ubuntu 22.04?
  • How to install Anaconda on Ubuntu 20.04?
  • How to install and use Docker on Ubuntu 20.04?
  • How to install and use Elinks on Ubuntu 20.04?
  • How to install and use PostgreSQL on Ubuntu 20.04?
  • How to install Android Studio on Ubuntu 22.04?
  • How to install Angular on Ubuntu 20.04?
  • How to install Ansible on Ubuntu 20.04?
  • How to install Apache Kafka on Ubuntu 20.04?
  • How to install Apache Maven on Ubuntu 20.04?
  • How to install Asterisk on Ubuntu 20.04?
  • How to install Atom Text Editor on Ubuntu 20.04?
  • How to install Blender on Ubuntu?
  • How to install CFEngine3 on Ubuntu 20.04?
  • How to install Code::Blocks on Ubuntu 20.04?
  • How to install Composer on Ubuntu 22.04?
  • How to install Curl on Ubuntu 20.04?
  • How to install Desktop GUI on Ubuntu 20.04?
  • How to install Discord on Ubuntu 22.04?
  • How to install Django on Ubuntu 20.04?
  • How to install Docker Compose on Ubuntu 20.04?
  • How to install ELK on Debian 10 or Debian 11?
  • How to install Fail2ban on Debian 10?
  • How to install FFmpeg on Ubuntu 20.04?
  • How to install Flask on Ubuntu 20.04?
  • How to install Fish Shell in Ubuntu 20.04?
  • How to install Gedit on Ubuntu 20.04?
  • How to install Git on Ubuntu 20.04?
  • How to install Gitea on Ubuntu 20.04?
  • How to install Gitlab on Ubuntu 20.04?
  • How to install Go on Ubuntu 20.04?
  • How to install Google Chrome on Ubuntu 20.04?
  • How to install Gradle on Debian 10?
  • How to install Gradle on Ubuntu 20.04?
  • How to install Grafana on Ubuntu 20.04?
  • How to install Hadoop on Ubuntu 20.04?
  • How to install Homebrew on Linux?
  • How to install JAVA with APT on Ubuntu 20.04?
  • How to install Julia on Ubuntu 20.04?
  • How to install Jupyter on an Ubuntu Linux VM?
  • How to install Kdenlive on Ubuntu 20.04?
  • How to install Logwatch on Ubuntu 20.04?
  • How to install MariaDB on Ubuntu 22.04?
  • How to install Memcached on Debian 10?
  • How to install Microsoft Edge Browser on Ubuntu 22.04?
  • How to install MongoDB on CentOS 7?
  • How to install MongoDB on Linux?
  • How to install Mono on Debian 10?
  • How to Install Mono on Ubuntu 20.04?
  • How to install Mozilla Firefox on Ubuntu 20.04?
  • How to install MySQL on Ubuntu 22.04?
  • How to install Nagios on Ubuntu 20.04?
  • How to install Nginx on CentOS 8?
  • How to install Nginx on Ubuntu 22.04?
  • How to install Node.js on Ubuntu 22.04?
  • How to install NVIDIA CUDA on Ubuntu 20.04?
  • How to install Nvidia Optimus Driver on Ubuntu 22.04?
  • How to install OpenCV on Ubuntu 20.04?
  • How to install Perl on Ubuntu on 22.04?
  • How to install Pacman on Ubuntu 20.04?
  • How to install phpMyAdmin on Ubuntu 20.04?
  • How to install PIP on CentOS 8?
  • How to install Plex media server on Ubuntu 20.04?
  • How to install Podman on Ubuntu 20.04?
  • How to install Poetry on Ubuntu 22.04?
  • How to install Postman on Ubuntu 20.04?
  • How to install ProcessWire on Ubuntu 20.04?
  • How to install PyQuery package on Ubuntu 20.04?
  • How to install Python 3 on Ubuntu 22.04 and set up a programming environment?
  • How to install Python on Windows 10 using UI and Windows command prompt?
  • How to install R on Ubuntu 20.04?
  • How to install Ruby on Rails on Ubuntu 20.04?
  • How to install Ruby on Ubuntu 20.04 and setup programming environment?
  • How to install Rust on Ubuntu 22.04?
  • How to install Samba on Ubuntu 20.04?
  • How to install Skype on Ubuntu 20.04?
  • How to install Rancher on Ubuntu 20.04?
  • How to install Slack on Ubuntu 20.04?
  • How to install Springboot on Ubuntu 20.04?
  • How to install Spyder IDE in Ubuntu?
  • How to install Steam on Ubuntu 20.04?
  • How to install Strapi for Production on Ubuntu 20.04?
  • How to install TeamViewer on Ubuntu 20.04?
  • How to install TensorFlow on Ubuntu 20.04?
  • How to install Apache Web Server on Ubuntu 20.04?
  • How to install the Deno JavaScript Runtime on Ubuntu 20.04?
  • How to install Open LiteSpeed Web Server on Ubuntu 22.04?
  • How to install Sublime Text Editor on Ubuntu 20.04?
  • How to install tix package on Ubuntu 20.04?
  • How to install Vagrant on Ubuntu 20.04?
  • How to install Vim on Ubuntu 20.04?
  • How to install VirtualBox on Ubuntu 20.04?
  • How to install VLC Media Player on Ubuntu 20.04?
  • How to install VMware Workstation Player on Ubuntu 20.04?
  • How to install vscode on Ubuntu 20.04?
  • How to install Webmin on Ubuntu 20.04?
  • How to install Wine on Ubuntu 20.04?
  • How to install WordPress on Ubuntu 20.04?
  • How to install Xibo on Ubuntu 20.04?
  • How to install Zoom on Ubuntu 20.04?
  • How to install Yarn on Ubuntu 20.04?
  • How to keep Ubuntu 22.04 servers updated?
  • How to monitor system authentication logs on Ubuntu 20.04?
  • How to perform basic administration tasks for Storage Devices in Linux?
  • How to set up Jenkins on Ubuntu 20.04?
  • How to setup OpenVPN on Ubuntu 20.04?
  • How to setup time synchronization on Ubuntu 20.04?
  • How to setup a UFW on an Ubuntu or Debian Cloud Server?
  • How to setup SSH keys in Ubuntu 20.04?
  • How to SSH terminal through MobaXterm and PuTTY in Windows?
  • How to Use Ansible to Install and Set Up Docker on Ubuntu 20.04?
  • How to view system users on Ubuntu 20.04?
  • How to make OpenStack accessible through CLI?
  • How to make Block Storage available for Linux?
  • Script for automated incremental backups
  • How to configure a Git repository using Linux CLI?
  • How to enable remote access on MySQL?
  • How to Mount a S3 Bucket on Linux Instance?
  • How to access MySQL database credentials?
  • How to create an incremental backup using CLI?
  • How to set up web-based SQL server on OpenStack instance using SQL image?
Powered by GitBook
On this page
  • Overview
  • Prerequisites
  • Key
  • Get Started
  • Conclusion

Was this helpful?

How to install Hadoop on Ubuntu 20.04?

PreviousHow to install Grafana on Ubuntu 20.04?NextHow to install Homebrew on Linux?

Last updated 1 year ago

Was this helpful?

Overview

Hadoop is an open-source software which is used to store and process large data sets in a distributed computing environment. It runs on low-cost hardware, making it affordable for businesses. Hadoop consists of two major core components namely HDFS and MapReduce.  The Hadoop Distributed File System (HDFS) allows data storage across multiple servers in a cluster, while MapReduce is a processing engine that enables distributed processing of large data sets across the cluster. 

In this tutorial, we'll walk you through the process of how to install Hadoop on Ubuntu20.04.

Prerequisites

There are certain prerequisites that need to be met before you begin:

  • Ubuntu 20.04 server configured on your system

  • A regular root user with sudo privileges

  • Internet connect ion

Key

  • Red box- Input

  • Green box- Output

Get Started

Step 1: Java Installation:

  • First, update your system by opening the terminal and running the following commands:

sudo -i 
sudo apt-get update && sudo apt-get upgrade 
  • Install OpenJDK 11 or any latest version of your choice, by running the following command:

sudo apt-get install openjdk-11-jdk 
  • Once the installation is complete, let's verify it with the following command:

java –version 

Step 2: Creating Hadoop User

  • We will now create a dedicated user using the 'adduser' command.

sudo adduser hadoopuser 

This will require entering the user's new password, full name, and other relevant information. To confirm that the details entered are correct, we will need to type "y/Y".

  • To switch to the Hadoop user that we have created, which in this case is 'hadoopuser', we need to run the following command:

su - hadoopuser 

Step 3: Generating Public and Private key-pairs

  • Once that's done, we can generate the private and public key pairs using the command below:

ssh-keygen -t rsa 

When prompted, specify the file location where you would like to save the key pair, and provide a passphrase that will be used for the Hadoop user setup.

  • Once the key pairs have been generated, we can add them to the SSH authorized_keys, using the command below:

cat ~/.ssh/id_rsa.pub >> ~/.ssh/authorized_keys 

Step 4: Permitting and Authorizing key-pairs

  • Since we have saved the key pair to the SSH authorized keys, we will now need to update the file permissions to 640. This will ensure that only the file owner (i.e., us) will have both read and write permissions, while the group will only have read permissions. No permissions will be granted to other users.

chmod 640 ~/.ssh/authorized_keys 
  • Authenticate the localhost, using the following command:

ssh localhost 
  • Enter yes to continue.

Step 5: Downloading and Installing Hadoop

  • To install the Hadoop framework on your system, you can use the following wget command: You may download any latest version. Here, we have downloaded the more stable Hadoop version 3.2.4.

wget https://downloads.apache.org/hadoop/common/hadoop-3.2.4/hadoop-3.2.4.tar.gz 

  • Once the download is complete, extract the hadoop-3.2.4.tar.gz file using the tar command.

tar -xvzf hadoop-3.2.4.tar.gz 

  • You can rename the extracted directory using the command provided below:

mv hadoop-3.2.4 hadoop 

Step 6: Setting up Hadoop

  • Next, you will need to configure the Java environment variables for setting up Hadoop. To do this, start by checking the location of the JAVA_HOME variable.

dirname $(dirname $(readlink -f $(which java))) 

  • Open ~/.bashrc" file in the text editor of your choice. Here, we are using nano text editor.

nano ~/.bashrc 
  • Open the file "~/.bashrc" and add the specified paths.

export JAVA_HOME=/usr/lib/jvm/java-11-openjdk-amd64  

export HADOOP_HOME=/home/hadoopuser/hadoop  

export HADOOP_INSTALL=$HADOOP_HOME  

export HADOOP_MAPRED_HOME=$HADOOP_HOME  

export HADOOP_COMMON_HOME=$HADOOP_HOME  

export HADOOP_HDFS_HOME=$HADOOP_HOME  

export HADOOP_YARN_HOME=$HADOOP_HOME  

export HADOOP_COMMON_LIB_NATIVE_DIR=$HADOOP_HOME/lib/native  

export PATH=$PATH:$HADOOP_HOME/sbin:$HADOOP_HOME/bin  

export HADOOP_OPTS="-Djava.library.path=$HADOOP_HOME/lib/native" 

Once you've added them, save the changes to the file by pressing CTRL+O and CTRL+X to exit from the editor.

  • Use the following command to activate the JAVA_HOME environment variable:

source ~/.bashrc 
  • Open the environment variable file for Hadoop and configure the JAVA_HOME variable for the Hadoop environment.

nano $HADOOP_HOME/etc/hadoop/hadoop-env.sh 
export JAVA_HOME=/usr/lib/jvm/java-11-openjdk-amd64 

Step 7: Configuring Hadoop for Ubuntu

  • For configuring Hadoop properly, it is necessary to create two directories namely, "datanode" and "namenode" - inside the home directory of Hadoop, using the followig commands:

mkdir -p ~/hadoopdata/hdfs/namenode 
mkdir -p ~/hadoopdata/hdfs/datanode 

  • To update the Hadoop core-site.xml file, you will need to add your hostname. Begin by confirming your system hostname using the command provided below.

hostname 

  • Then, open the core-site.xml file in the nano editor.

nano $HADOOP_HOME/etc/hadoop/core-site.xml 
  • Configure the core-site.xml file by adding the following lines:

<configuration> 

 <property> <name>fs.defaultFS</name>  

<value>hdfs://localhost:9000</value>  

</property>  

</configuration> 
nano $HADOOP_HOME/etc/hadoop/ hdfs-site.xml 
  • Configure the hdfs-site.xml file by adding the following lines:

<configuration> 

<property> <name>dfs.replication</name>  

<value>1</value>  

</property>  

<property> <name>dfs.namenode.name.dir</name>  

<value>/usr/local/hadoop/data/dfs/namenode</value>  

</property>  

<property> <name>dfs.datanode.data.dir</name>  

<value>/usr/local/hadoop/data/dfs/datanode</value> 

  </property> 

 </configuration> 
  • Open configuration file of MapReduce, with the following command:

nano $HADOOP_HOME/etc/hadoop/ mapred-site.xml 
  • Configure the mapred-site.xml file by adding the following lines:

 <configuration> 

<property>  

<name>mapreduce.framework.name</name> 

 <value>yarn</value>  

</property>  

</configuration> 
  • Open yarn configuration file, with the following command:

nano $HADOOP_HOME/etc/hadoop/ yarn-site.xml 
  • Configure the yarn-site.xml file by adding the following lines:

<configuration>  

<property>  

<name>yarn.nodemanager.aux-services</name>  

<value>mapreduce_shuffle</value> </property> 

  <property>  

<name>yarn.nodemanager.aux-services.mapreduce.shuffle.class</name> 

 <value>org.apache.hadoop.mapred.ShuffleHandler</value> 

</property>  

</configuration> 

Step 8: Running Hadoop Cluster

  • Format the Hadoop file system by running the following command:

hdfs namenode –format 

  • Start the Hadoop daemons by running the following command:

start-dfs.sh 

start-yarn.sh 

  • Verify that the Hadoop daemons are running by running the following command:

jps 

  • To enable Hadoop to listen at ports 8088 and 9870, you will need to allow these ports through the firewall, using the following commands:

sudo ufw allow 8088 
sudo ufw allow 9870 

Note: If you face error like <user is not in sudoer file>, as shown in the below image in green box, follow the below steps:

  • Logout from specific user using the following command:

logout  
  • Enter root user using the following command:

sudo -i 

To access your Hadoop "namenode", open your web browser and enter your IP address followed by the port numbers 9870 and 8088. You'll be prompted to the screen, as shown below:

Conclusion

Hadoop is now successfully installed. You may start efficiently storing and managing your large datasets.