How to install Hadoop on Ubuntu 20.04?
Last updated
Was this helpful?
Last updated
Was this helpful?
Hadoop is an open-source software which is used to store and process large data sets in a distributed computing environment. It runs on low-cost hardware, making it affordable for businesses. Hadoop consists of two major core components namely HDFS and MapReduce. The Hadoop Distributed File System (HDFS) allows data storage across multiple servers in a cluster, while MapReduce is a processing engine that enables distributed processing of large data sets across the cluster.
In this tutorial, we'll walk you through the process of how to install Hadoop on Ubuntu20.04.
There are certain prerequisites that need to be met before you begin:
Ubuntu 20.04 server configured on your system
A regular root user with sudo privileges
Internet connect ion
Red box- Input
Green box- Output
Step 1: Java Installation:
First, update your system by opening the terminal and running the following commands:
Install OpenJDK 11 or any latest version of your choice, by running the following command:
Once the installation is complete, let's verify it with the following command:
Step 2: Creating Hadoop User
We will now create a dedicated user using the 'adduser' command.
This will require entering the user's new password, full name, and other relevant information. To confirm that the details entered are correct, we will need to type "y/Y".
To switch to the Hadoop user that we have created, which in this case is 'hadoopuser', we need to run the following command:
Step 3: Generating Public and Private key-pairs
Once that's done, we can generate the private and public key pairs using the command below:
When prompted, specify the file location where you would like to save the key pair, and provide a passphrase that will be used for the Hadoop user setup.
Once the key pairs have been generated, we can add them to the SSH authorized_keys, using the command below:
Step 4: Permitting and Authorizing key-pairs
Since we have saved the key pair to the SSH authorized keys, we will now need to update the file permissions to 640. This will ensure that only the file owner (i.e., us) will have both read and write permissions, while the group will only have read permissions. No permissions will be granted to other users.
Authenticate the localhost, using the following command:
Enter yes to continue.
Step 5: Downloading and Installing Hadoop
To install the Hadoop framework on your system, you can use the following wget command: You may download any latest version. Here, we have downloaded the more stable Hadoop version 3.2.4.
Once the download is complete, extract the hadoop-3.2.4.tar.gz file using the tar command.
You can rename the extracted directory using the command provided below:
Step 6: Setting up Hadoop
Next, you will need to configure the Java environment variables for setting up Hadoop. To do this, start by checking the location of the JAVA_HOME variable.
Open ~/.bashrc" file in the text editor of your choice. Here, we are using nano text editor.
Open the file "~/.bashrc" and add the specified paths.
Once you've added them, save the changes to the file by pressing CTRL+O and CTRL+X to exit from the editor.
Use the following command to activate the JAVA_HOME environment variable:
Open the environment variable file for Hadoop and configure the JAVA_HOME variable for the Hadoop environment.
Step 7: Configuring Hadoop for Ubuntu
For configuring Hadoop properly, it is necessary to create two directories namely, "datanode" and "namenode" - inside the home directory of Hadoop, using the followig commands:
To update the Hadoop core-site.xml file, you will need to add your hostname. Begin by confirming your system hostname using the command provided below.
Then, open the core-site.xml file in the nano editor.
Configure the core-site.xml file by adding the following lines:
Configure the hdfs-site.xml file by adding the following lines:
Open configuration file of MapReduce, with the following command:
Configure the mapred-site.xml file by adding the following lines:
Open yarn configuration file, with the following command:
Configure the yarn-site.xml file by adding the following lines:
Step 8: Running Hadoop Cluster
Format the Hadoop file system by running the following command:
Start the Hadoop daemons by running the following command:
Verify that the Hadoop daemons are running by running the following command:
To enable Hadoop to listen at ports 8088 and 9870, you will need to allow these ports through the firewall, using the following commands:
Note: If you face error like <user is not in sudoer file>, as shown in the below image in green box, follow the below steps:
Logout from specific user using the following command:
Enter root user using the following command:
To access your Hadoop "namenode", open your web browser and enter your IP address followed by the port numbers 9870 and 8088. You'll be prompted to the screen, as shown below:
Hadoop is now successfully installed. You may start efficiently storing and managing your large datasets.