Configure Hadoop Distributed Storage Cluster using Ansible Automation

Samkit Shah
3 min readNov 30, 2020

--

We need automation whenever there is a need to perform any repetitive task over and over again and when it comes to configuration management we always think of Ansible.

In this article, I’ve configured Hadoop Multi-node cluster writing a simple Ansible playbook which might be very easy for a beginner in ansible to comprehend and configure by themself.

Pre-requisite:

  • The system should have Java JDK and Hadoop software installed before running the ansible-playbook
java version
hadoop version

After we are done with the installation part we move on to the next part of writing an ansible-playbook, Configuring the Hadoop DS cluster involves five basic steps

Name Node/ Master Node

STEP 1

First, we provide the host where we want to configure the name node/master node and then we perform the list of tasks in order to finish the configuration for Name Node.

STEP 2

Second, we create a folder where metadata will be stored, using file module provided by ansible.

STEP 3

Third, as we are creating a playbook and we already know what needs to be configured. So in the hdfs-site.xml (/etc/hadoop/hdfs-site.xml) file and core-site.xml (/etc/hadoop/core-site.xml) file, we write our configuration for the name node where we write its properties, name, and value.

hdfs-site.xml file
hdfs-site.xml
core-site.xml file
core-site.xml
namenode playbook

STEP 4

Now we format the name node and start the service. When we format namenode it formats the meta-data related to data-nodes. By doing that, all the information on the datanodes are lost and they can be reused for new data.

Finally, run your ansible-playbook

Data Node/ Slave Node

For Data Node we have a step less as compare to the namenode. Here is what we need to configure in hdfs-site.xml and core-site.xml files

hdfs-site.xml file
hdfs-site.xml file
core-site.xml file
core-site.xml file

And all process same as Name Node

Datanode playbook

Now we just have to hit ansible-playbook command to run it and we are done.

And you’ll be done.

Thanks for reading!

--

--

Samkit Shah

Machine Learning | Deep Learning | DevOps | MLOps | Cloud Computing | BigData