Configure Hadoop Distributed Storage Cluster using Ansible Automation
We need automation whenever there is a need to perform any repetitive task over and over again and when it comes to configuration management we always think of Ansible.
In this article, I’ve configured Hadoop Multi-node cluster writing a simple Ansible playbook which might be very easy for a beginner in ansible to comprehend and configure by themself.
Pre-requisite:
- The system should have Java JDK and Hadoop software installed before running the ansible-playbook
After we are done with the installation part we move on to the next part of writing an ansible-playbook, Configuring the Hadoop DS cluster involves five basic steps
Name Node/ Master Node
STEP 1
First, we provide the host where we want to configure the name node/master node and then we perform the list of tasks in order to finish the configuration for Name Node.
STEP 2
Second, we create a folder where metadata will be stored, using file module provided by ansible.
STEP 3
Third, as we are creating a playbook and we already know what needs to be configured. So in the hdfs-site.xml (/etc/hadoop/hdfs-site.xml) file and core-site.xml (/etc/hadoop/core-site.xml) file, we write our configuration for the name node where we write its properties, name, and value.
STEP 4
Now we format the name node and start the service. When we format namenode it formats the meta-data related to data-nodes. By doing that, all the information on the datanodes are lost and they can be reused for new data.
Finally, run your ansible-playbook
Data Node/ Slave Node
For Data Node we have a step less as compare to the namenode. Here is what we need to configure in hdfs-site.xml and core-site.xml files
And all process same as Name Node
Now we just have to hit ansible-playbook command to run it and we are done.
And you’ll be done.
Thanks for reading!