Major compaction in hbase

In HBase, a major compaction is an operation that compacts all the HFiles (store files) within each region of a table into a single HFile, removing any deleted data and old versions. This helps improve read performance and reduces disk usage. However, major compaction can be I/O intensive, so it should be scheduled carefully.

Below are various ways to execute major compaction in HBase:


1. Triggering Major Compaction Using HBase Shell

You can use the HBase shell to initiate a major compaction on:

  • An entire table
  • A single region

Command for a Table:

hbase> major_compact 'your_table_name'

Command for a Region: You need the encoded region name (which you can get using hbase> status 'detailed').

hbase> major_compact 'region_encoded_name'

Command for a Column Family:

hbase> major_compact 'your_table_name', 'your_column_family'

2. Using HBase Admin API (Java)

If you need to programmatically trigger a major compaction (for automation or monitoring systems), you can use the HBase Admin API.

import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.hbase.HBaseConfiguration;
import org.apache.hadoop.hbase.client.Admin;
import org.apache.hadoop.hbase.client.Connection;
import org.apache.hadoop.hbase.client.ConnectionFactory;

public class HBaseMajorCompaction {
    public static void main(String[] args) throws Exception {
        Configuration config = HBaseConfiguration.create();
        try (Connection connection = ConnectionFactory.createConnection(config);
             Admin admin = connection.getAdmin()) {

            // Major compact the entire table
            String tableName = "your_table_name";
            admin.majorCompact(tableName);
            System.out.println("Major compaction triggered on table: " + tableName);
        }
    }
}

3. Verifying Compaction Progress

After triggering the compaction, you can verify whether it’s in progress or completed.

  • Check logs: On the RegionServers, check the hbase-regionserver.log for messages related to compaction.
  • HBase Web UI: The HBase Master UI (usually at http://<master-host>:16010) provides information on compactions in progress.

4. Scheduling Major Compaction with Configuration

You can configure automatic major compactions in HBase by modifying the hbase-site.xml file.

<property>
  <name>hbase.hregion.majorcompaction</name>
  <value>86400000</value> <!-- Compaction every 24 hours (in ms) -->
</property>

<property>
  <name>hbase.hregion.majorcompaction.jitter</name>
  <value>0.5</value> <!-- Introduce jitter to avoid compactions at the same time -->
</property>
  • Jitter spreads the compaction schedule to avoid simultaneous compaction across all regions.

5. Important Considerations for Major Compaction

  1. Performance Impact: Major compaction is I/O heavy and can impact cluster performance. It’s recommended to run it during off-peak hours.
  2. Disk Usage: Ensure enough disk space is available, as compaction temporarily requires extra space to write the new HFile.
  3. Automatic vs Manual: Major compactions can be scheduled to run automatically, but manual compactions might still be needed after heavy write operations.
  4. Data Deletions and TTL: Major compaction is required for deleted data (marked as tombstones) and expired data (via TTL) to be permanently removed.

By following these steps, you can effectively manage major compaction in HBase to improve performance and reclaim space.

댓글

이 블로그의 인기 게시물

Using the MinIO API via curl

How to split a list into chunks of 100 items in JavaScript, 자바스크립트 리스트 쪼개기

HTML Inline divisions at one row by Tailwind

Boilerplate for typescript server programing

가속도 & 속도

Gradle multi-module project

How to checkout branch of remote git, 깃 리모트 브랜치 체크아웃

CDPEvents in puppeteer

Sparse encoder

Reactjs datetime range picker