Register Login

Hbase Interview Questions and Answers

Updated Apr 23, 2025

What is HBase in Hadoop?

HBase is basically a distributed database that is developed over HDFS (Hadoop File System). It provides fast lookups for large data tables. As it is part of the Hadoop ecosystem, it allows real-time access to read/write data in HDFS.

What are HBase main server components?

  • HMaster: Handles the processes where regions are allocated to Region Servers for balancing the load and managing cluster conditions.
  • Region Server: Handles read, write, update, and delete requests by communicating with the clients.
  • Zookeeper: Acts as the communication point between clients and Region Servers and handles configuration information.

What is the syntax for describing a Command in Hbase?

The syntax is:

<hbase> describe 'table name'

How many masters are possible in HBase?

In HBase, a cluster consists of one Master and three or more Region Servers.

How are HBase tables divided?

HBase tables are divided horizontally into Regions using row key ranges. These Regions are allocated to Region Servers.

What is rowkey in HBase table?

The row keys are used to index the tables in HBase. They are like addresses and data is lexicographically sorted using the row key.

Why to choose Hbase over Cassandra?

HBase Cassandra
Provides automatic rebalancing within the cluster. Provides rebalancing, but not for the entire cluster.
Allows global range scanning. Range scans can be done only in one partition.
Works on CP model (Consistency and Partition Tolerance). Works on AP model (Availability and Partition Tolerance).
Consistency is high. Not as high as HBase.
Supports coprocessors for running user-generated code. No such feature.

When to use bucket cache in HBase?

BucketCache is used for off-heap data block storage, often with SSDs. The on-heap cache is used for Bloom filters and indexes.

What is a column family in Hbase?

Column families group columns with similar features. They are defined at table creation and consist of printable characters.

Explain HBaseFsck class?

The HBaseFsck class is used to evaluate table integrity and region consistency in corrupted HBase. It supports read-write repair and read-only detection modes.

Explain the bloom filter in HBase?

Bloom filters in HBase help test if an HFile includes a specific row-col cell. This improves the cluster's performance.

How to get remote access from Hbase?

Remote access can be achieved using the Java API.

How to clear zombie table in Hbase list?

  1. Check Zookeeper using ls /hbase/table to locate the table.
  2. Run rmr /hbase/table/TABLE_NAME.
  3. Restart the cluster.

What is thrift API in hbase?

The Thrift API enables multi-language services. It allows HBase access via a Thrift server using a Java client.

How to improve bulk load performance in hbase?

  1. Analyze data size and number of regions.
  2. Create an empty table with pre-split regions.
  3. Use Spark to partition the RDD based on splits.
  4. Create HFiles using Spark.
  5. Load data into HBase using standard commands.

What are the various filters are available in Apache HBase?

  • Compare Filter: Filters based on comparison conditions.
  • Family Filter: Filters based on column family.
  • Key Only Filter: Returns only the key component of key-value pairs.
  • Inclusive Stop Filter: Stops at a specific row.

What is the difference between hbase and Hive?

HBase Hive
Used for data storage. Used as a query engine.
Supports transactional processing. Supports batch processing.
Real-time operations. Not real-time.
Low latency. High latency.
Cost-effective. More costly.

What is YCSB and what is the use of YCSB?

The Yahoo! Cloud Serving Benchmark (YCSB) is used to test HBase performance. It has:

  • A workload generator (YCSB client).
  • Core workloads for benchmarking NoSQL DBMSs.

What is Hbase zookeeper and how Hbase uses zookeeper?

The zookeeper acts as the communication point for clients and region servers. HBase uses it to maintain server states and coordinate among distributed system components.

What is a standalone distribution in HBase?

In standalone mode, HBase runs all components, including Zookeeper, in a single JVM and uses the local file system instead of HDFS.

What is a region server?

Region Server handles all data operations like read, write, update, delete, and manages regions within HBase.

Can Hbase handle unstructured data?

Yes, HBase can handle unstructured data as it sacrifices strict ACID compliance for scalability.


×