Read data from HDFS using JAVA
1.For a client to read data from a Hadoop cluster it needs to have the Hadoop client library which is in the form of a Java Jar file. It also needs to have the cluster configuration data as this data is required to locate the Namenode.
2.Let us assume the client wants to read a file named log.txt which is located within the folder / user / Hadoop.
log.txt ----> /user/hadoop
3.The client could begin the read process by contacting the Namenode and specifying the name and location of the file it would like to read.
4.The Namenode will then try to validate this user.
5.Once this user has been validated, it would check whether the requested user has all the appropriate permissions to access this file.
6.If this file is present and the requested user has all the appropriate permissions to access this then the Namenode would respond back to the client with the 1st block ID of the requested file along with the list of all the Datanodes that have a copy of the requested file sorted by the distance.
7.Since the client has all the data related to the file it is looking for, it can contact directly to the data node and read the data from it.
8.The process of reading data from the data node keeps repeating until all the data blocks of the requested file have not been retrieved or the client cancels this process by closing the stream.
9.During the read process, if the Datanode fails or the process dies for some reason then the client automatically tries to read the data from the next Datanode that holds a copy of the requested data.
10. If the copy of the requested data as an unavailable on all the Datanodes then the read process fails.