Configure Hadoop Distributed Storage Cluster using Ansible Automation

3 min readNov 30, 2020

We need automation whenever there is a need to perform any repetitive task over and over again and when it comes to configuration management we always think of Ansible.

In this article, I’ve configured Hadoop Multi-node cluster writing a simple Ansible playbook which might be very easy for a beginner in ansible to comprehend and configure by themself.

Pre-requisite:

The system should have Java JDK and Hadoop software installed before running the ansible-playbook

After we are done with the installation part we move on to the next part of writing an ansible-playbook, Configuring the Hadoop DS cluster involves five basic steps

Name Node/ Master Node

STEP 1

First, we provide the host where we want to configure the name node/master node and then we perform the list of tasks in order to finish the configuration for Name Node.

STEP 2

Second, we create a folder where metadata will be stored, using file module provided by ansible.

STEP 3

Third, as we are creating a playbook and we already know what needs to be configured. So in the hdfs-site.xml (/etc/hadoop/hdfs-site.xml) file and core-site.xml (/etc/hadoop/core-site.xml) file, we write our configuration for the name node where we write its properties, name, and value.

STEP 4

Now we format the name node and start the service. When we format namenode it formats the meta-data related to data-nodes. By doing that, all the information on the datanodes are lost and they can be reused for new data.