Register Login

Hbase Interview Questions

Updated Feb 12, 2019

What is Hbase in Hadoop?

Hbase is basically a distributed database that is developed over the HDFS (Hadoop file system). It provides fast lookups for large data tables. As it a part of the Hadoop ecosystem, it allows real-time access to read/write to the data in the HDFS (Hadoop File System).

What are Hbase main server components?

The main server components of Hbase main are:

  • HMaster – It handles the processes where the regions are allocated to Region servers for balancing the load. It also manages the current condition of the Hadoop cluster.
  • Region Sever – It handles read, write, update and delete requests by communicating with the clients.
  • Zookeper – It is the communication point between the clients and region servers. It is a monitoring system that handles the configuration information.

What is the syntax for describing Command in hbase?

The syntax of describing command in Hbase is:

hbase> describe 'table name'

How many masters are possible in hbase?

In Hbase, a cluster consists of one Master and three or more Region Servers.

How are hbase tables divided?

The Hbase tables are divided horizontally into Regions, using the row key ranges. All the rows in the table between the start and end key are present in the regions. These Regions are allocated to the Region Servers.  

What is rowkey in Hbase table?

The row keys are used to index the tables in Hbase. They can be considered as addresses of the tables. The data is lexicographically sorted using the row key.

Why to choose Hbase over Cassandra? 

The main reasons for selecting Hbase over Cassandra are:

Hbase

Cassandra

It provides automatic rebalancing within the cluster.

It provides rebalancing, but not for the entire cluster.

It allows global range scanning.

Range scans can be done only in one partition.

Based on CAP theorem, Hbase works on CP model (Consistency and Partition Tolerance)

Based on CAP theorem, Cassandra works on AP model (Availability and Partition Tolerance)

Consistency is high

Not as high as Hbase

Hbase provides coprocessors, that provide a run time environment and libraries for running user generated code.

This feature is not present is Cassandra.

When to use bucketcache in hbase?

Hbase uses a single on-heap cache according to the default configuration. The data blocks are stored using off-heap BucketCache. The on-heap cache is used for Bloom filters and indexes. BucketCache is used on solid state disks, in the File mode, is used for garbage collection.

What is a column family in hbase?

Columns are divided into column families in HBase. The column family consists of similar features to all the columns created inside it. These are defined during table creation. Column family consists of printable characters.

Explain HBaseFsck class?

The HBase Fsck class is used for evaluating table integration and region consistency in a corrupted HBase. It repairs the HBase using two modes, multi-phase read-write repair mode and read-only inconsistency identifying mode.

Explain bloom filter in Hbase?

The HBase has a Bloom Filter that is used to test whether a HFile consists of a row-col cell or a specific row. It enhances the total output of the HBase cluster.

How to get remote access from Hbase?

Remote access can be obtained using Java API.

How to clear zombie table in Hbase list?

Steps to clear zombie table:

  • Check inside the Zookeeper using the /hbase/table (ls /hbase table) command to find the table you want to remove.
  • Run rmr /hbase/table/TABLE_NAME.
  • Restart the cluster.

What is thrift API in hbase?

The Thrift API is useful for developing services for running them on multiple languages. The HBase Thrift interface can be used to access the HBase over Thrift server with a Java client.

How to improve bulk load performance in hbase?

The steps to improve bulk load performance in HBase:

  • Analyse the size of data, and find out the number of regions in HBase.
  • Create an empty table and pre-split boundaries of regions.
  • Use a practitioner in Spark and split the RDD to match with the splits.
  • Create the HFiles using Spark
  • Load the data in HBase using standard commands.

What are the various filters are available in Apache HBase?

The different filters used in Apache Hbase are:

Compare Filter – This is used to filter based on a condition of comparison.

Family Filter – Based on a column family, this filter is used to obtain the required data.

Key Only Filter – This is used to obtain the key component of any key value pair.

Inclusive Stop Filter – This is used to stop the given row.

What is the difference between hbase and hive?

The major differences between HBase and Hive are:

HBase

Hive

Used for data storage.

It is a query engine.

It is used for transactional processing

It is used for batch processing.

Operations are run in real time.

It is not in case of Hive.

It has low latency.

It has high latency.

It is cost effective.

It is costly.

What is YCSB and what is the use of YCSB?

The YSCB (Yahoo! Cloud Serving Benchmark) is developed by Yahoo to test Hbase performance. It is used to analyse the functionality of NoSQL database management systems. It was developed at Yahoo Labs to provide a common set of workloads. There are two components:

  • The workload generator called YSCB client.
  • Core workloads that can be executed by the generator.

What is hbase zookeeper and how hbase uses zookeeper?
Zookeeper is the communication point through which the clients communicate with the region servers. For a cluster, HBase uses it to maintain the state of the server. Zookeeper coordinates this information among different members of the distributed systems. HMaster that is active along with the Region Server connect with the Zookeeper through a session.

What is standalone distribution in hbase?

There are two rum modes in HBase, standalone and distributed. In the standalone mode, HBase uses the local file system rather than using HDFS (Hadoop File System). It executes local Zookeeper and all the HBase daemons within the same JVM process.

What is region server?

The region server handles read, write, update and delete while communicating with the clients. It handles data related operations. It decides the region sizes.

Can hbase handle unstructured data?

HBase can store unstructured data as it does not strictly follow the ACID properties of RDBMS for better scalability.


×