HDFS Architecture

Brief about How Big companies like Facebook stores a massive amount of incoming data every day.

Samkit Shah
2 min readSep 28, 2020

--

Everyday Facebook receives around 600TB+ data every day. Every day Facebook stores user data, like on the pictures, writing posts, messaging, and tons of other things making it a massive data wonderland. Ever thought of how do they store this much amount of data every day. You might have had several questions like do they store data in their own Hard disk with an enormous amount of storage capacity, or they purchase H.D from some other companies.

When we have an immense amount of data that is beyond our storage capacity that problem is called as BigData problem and to solve this kind of problem one could say that “We can use a storage device which has a massive amount of data storage capacity to store the data.” That’s right but in this fast and rapidly growing world, we can’t use storage devices like a hard disk to store the data. For example, Storing 1GB data in the hard disk takes 1 minute. So just think of storing 600TB+ data each day, It gonna take a lot of time. Although RAM is faster than HDD, But we want to store persistent data. So if we use HD to store the data it’ll arises I/O processing. So, here comes the concept of Distributed Storage Cluster that is the more agile way of storing large volumes with high velocity. Using this concept or architecture of a Distributed Storage Cluster we can achieve parallelism. In DSC we have one master node and several slave node, so the data entering the master node get distributed in all the slave nodes which. For example: Let’s say we have one Master node which has 40GB of data and takes 40 minutes to store the data and four Slave nodes which can store up to 10 GB of data. So in this master-slave architecture what we do is we split the data, so 40GB gets split into 4*10GB and each 10Gb will be store in the slave node at the same time. Hence we solve the problem of huge volumes of data and the rate at which data is transferred. This BigData problem can be easily solved using one software named Hadoop → HDFS which is also used by Facebook.

--

--

Samkit Shah

Machine Learning | Deep Learning | DevOps | MLOps | Cloud Computing | BigData