Major compaction in hbase

In HBase, a major compaction is an operation that compacts all the HFiles (store files) within each region of a table into a single HFile, removing any deleted data and old versions. This helps improve read performance and reduces disk usage. However, major compaction can be I/O intensive, so it should be scheduled carefully.

Below are various ways to execute major compaction in HBase:


1. Triggering Major Compaction Using HBase Shell

You can use the HBase shell to initiate a major compaction on:

  • An entire table
  • A single region

Command for a Table:

hbase> major_compact 'your_table_name'

Command for a Region: You need the encoded region name (which you can get using hbase> status 'detailed').

hbase> major_compact 'region_encoded_name'

Command for a Column Family:

hbase> major_compact 'your_table_name', 'your_column_family'

2. Using HBase Admin API (Java)

If you need to programmatically trigger a major compaction (for automation or monitoring systems), you can use the HBase Admin API.

import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.hbase.HBaseConfiguration;
import org.apache.hadoop.hbase.client.Admin;
import org.apache.hadoop.hbase.client.Connection;
import org.apache.hadoop.hbase.client.ConnectionFactory;

public class HBaseMajorCompaction {
    public static void main(String[] args) throws Exception {
        Configuration config = HBaseConfiguration.create();
        try (Connection connection = ConnectionFactory.createConnection(config);
             Admin admin = connection.getAdmin()) {

            // Major compact the entire table
            String tableName = "your_table_name";
            admin.majorCompact(tableName);
            System.out.println("Major compaction triggered on table: " + tableName);
        }
    }
}

3. Verifying Compaction Progress

After triggering the compaction, you can verify whether it’s in progress or completed.

  • Check logs: On the RegionServers, check the hbase-regionserver.log for messages related to compaction.
  • HBase Web UI: The HBase Master UI (usually at http://<master-host>:16010) provides information on compactions in progress.

4. Scheduling Major Compaction with Configuration

You can configure automatic major compactions in HBase by modifying the hbase-site.xml file.

<property>
  <name>hbase.hregion.majorcompaction</name>
  <value>86400000</value> <!-- Compaction every 24 hours (in ms) -->
</property>

<property>
  <name>hbase.hregion.majorcompaction.jitter</name>
  <value>0.5</value> <!-- Introduce jitter to avoid compactions at the same time -->
</property>
  • Jitter spreads the compaction schedule to avoid simultaneous compaction across all regions.

5. Important Considerations for Major Compaction

  1. Performance Impact: Major compaction is I/O heavy and can impact cluster performance. It’s recommended to run it during off-peak hours.
  2. Disk Usage: Ensure enough disk space is available, as compaction temporarily requires extra space to write the new HFile.
  3. Automatic vs Manual: Major compactions can be scheduled to run automatically, but manual compactions might still be needed after heavy write operations.
  4. Data Deletions and TTL: Major compaction is required for deleted data (marked as tombstones) and expired data (via TTL) to be permanently removed.

By following these steps, you can effectively manage major compaction in HBase to improve performance and reclaim space.

댓글

이 블로그의 인기 게시물

Fundamentals of English Grammar #1

To switch to a specific tag in a Git repository

kafka polling vs listen

Create topic on Kafka with partition count, 카프카 토픽 생성하기

Scan an HBase table with a prefix filter

To download a file from MinIO using Spring Boot, 스프링부트 Minio 사용하기

Joining an additional control plane node to an existing Kubernetes cluster

Vespa vs Milvus

max_active_runs of Airflow

urllib3 with proxy settings