Online Tutorials & Training Materials |
Register Login

What is Hadoop?

|| || 0

What is Hadoop?

Hadoop Introduction

Hadoop is an open source big data framework developed by Doug Cutting in the year 2006. It is managed by the Apache Software Foundation. The project was named after Hadoop, a yellow stuff toy which Cutting’s son had.

Hadoop is designed to store and process huge volumes of data efficiently.

Hadoop framework comprises of two main components:

  • HDFS - It stands for Hadoop Distributed File System. It takes care of storing and managing the data within the Hadoop cluster.
  • MapReduce - It takes care of processing and managing the data present within the HDFS.

What makes up a Hadoop cluster?

1) A Hadoop cluster is made up of two nodes. A node is a technical term described to denote a machine or a computer present within a cluster. The two nodes present in the Hadoop cluster are:

  • Master Node- It is responsible for running the Name node and the Job Tracker Demons.
  • Slave Node- It is responsible for running the Data node and the Task Tracker Demons.

2) Demon is a technical tern used to describe a background process that is running on a Linux machine.

3) Name node and Data node are responsible for storing and managing the data. They are commonly referred to as Storage Node

4) Job Tracker and Task Tracker are responsible for processing the data and are commonly referred to as Compute Node.

5) Generally, Name node and Job Tracker are configured on the same machine whereas Data node and Task Tracker are configured on multiple machines, but can have instances of running in more than one machine at the same time.

6) There is also a secondary name node as part of the Hadoop cluster.

Read Next Key Features of Hadoop

Related Articles