Major compaction in hbase

In HBase, a major compaction is an operation that compacts all the HFiles (store files) within each region of a table into a single HFile, removing any deleted data and old versions. This helps improve read performance and reduces disk usage. However, major compaction can be I/O intensive, so it should be scheduled carefully.

Below are various ways to execute major compaction in HBase:


1. Triggering Major Compaction Using HBase Shell

You can use the HBase shell to initiate a major compaction on:

  • An entire table
  • A single region

Command for a Table:

hbase> major_compact 'your_table_name'

Command for a Region: You need the encoded region name (which you can get using hbase> status 'detailed').

hbase> major_compact 'region_encoded_name'

Command for a Column Family:

hbase> major_compact 'your_table_name', 'your_column_family'

2. Using HBase Admin API (Java)

If you need to programmatically trigger a major compaction (for automation or monitoring systems), you can use the HBase Admin API.

import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.hbase.HBaseConfiguration;
import org.apache.hadoop.hbase.client.Admin;
import org.apache.hadoop.hbase.client.Connection;
import org.apache.hadoop.hbase.client.ConnectionFactory;

public class HBaseMajorCompaction {
    public static void main(String[] args) throws Exception {
        Configuration config = HBaseConfiguration.create();
        try (Connection connection = ConnectionFactory.createConnection(config);
             Admin admin = connection.getAdmin()) {

            // Major compact the entire table
            String tableName = "your_table_name";
            admin.majorCompact(tableName);
            System.out.println("Major compaction triggered on table: " + tableName);
        }
    }
}

3. Verifying Compaction Progress

After triggering the compaction, you can verify whether it’s in progress or completed.

  • Check logs: On the RegionServers, check the hbase-regionserver.log for messages related to compaction.
  • HBase Web UI: The HBase Master UI (usually at http://<master-host>:16010) provides information on compactions in progress.

4. Scheduling Major Compaction with Configuration

You can configure automatic major compactions in HBase by modifying the hbase-site.xml file.

<property>
  <name>hbase.hregion.majorcompaction</name>
  <value>86400000</value> <!-- Compaction every 24 hours (in ms) -->
</property>

<property>
  <name>hbase.hregion.majorcompaction.jitter</name>
  <value>0.5</value> <!-- Introduce jitter to avoid compactions at the same time -->
</property>
  • Jitter spreads the compaction schedule to avoid simultaneous compaction across all regions.

5. Important Considerations for Major Compaction

  1. Performance Impact: Major compaction is I/O heavy and can impact cluster performance. It’s recommended to run it during off-peak hours.
  2. Disk Usage: Ensure enough disk space is available, as compaction temporarily requires extra space to write the new HFile.
  3. Automatic vs Manual: Major compactions can be scheduled to run automatically, but manual compactions might still be needed after heavy write operations.
  4. Data Deletions and TTL: Major compaction is required for deleted data (marked as tombstones) and expired data (via TTL) to be permanently removed.

By following these steps, you can effectively manage major compaction in HBase to improve performance and reclaim space.

댓글

이 블로그의 인기 게시물

PYTHONPATH, Python 모듈 환경설정

You can use Sublime Text from the command line by utilizing the subl command

git 명령어

[gRPC] server of Java and client of Typescript

[Ubuntu] Apache2.4.x 설치

Create topic on Kafka with partition count, 카프카 토픽 생성하기

리눅스의 부팅과정 (프로세스, 서비스 관리)

Auto-populate a calendar in an MUI (Material-UI) TextField component

The pierce selector in Puppeteer