What is Rack Awareness?
The process of making Hadoop aware about what machine is part of which rank and how do racks are connected to each other within the Hadoop cluster is known as rack awareness.
Advantages of using Rack Awareness
Hadoop keeps multiple copies for all the data that is present within the HDFS. If you make Hadoop aware about the rack topology this would help Hadoop in placing each copy of data across different racks. By doing this in case if the entire rack fails for some reason even then you would be able to retrieve the data from a different rack.
The MapReduce jobs can also benefit from rack awareness by knowing where the data is located that is required by the map task. It can run the map task on that particular machine itself thereby saving a lot of bandwidth and time. This is referred to as data local task
Sometimes the map task my find all the machines that have a copy of the required data are busy processing other map tasks and none of the machine have any available slots to process this task. In this case the job tracker would run this map task on any other machine within the same rack where the slots are available.
In order to do so, it needs to stream the data to this machine. Since this data needs to be streamed within the same rack the bandwidth consumption would be less. This process is commonly referred to as rack local task.