difference between truncate and truncate_preserve in hbase

In HBase, both truncate and truncate_preserve commands are used to delete all the data in a table, but they behave differently in terms of how they handle the table’s schema and regions.

1. truncate

  • Purpose: Deletes all data and regions in a table and then recreates it with the same schema.
  • What It Does: When you run truncate, it:

    1. Disables the table.
    2. Deletes all data and metadata (including region splits) in the table.
    3. Drops all regions and recreates the table as a single region with the original schema.
    4. Re-enables the table.
  • Effect on Regions: After truncate, all region splits are lost, so the table will start with only one region. This is useful for development or quick data purging but is inefficient for large tables that rely on pre-split regions for load balancing.

  • Command Syntax:

    truncate 'table_name'
    

2. truncate_preserve

  • Purpose: Deletes all data in a table but preserves the table’s regions and schema.
  • What It Does: When you use truncate_preserve, it:

    1. Disables the table.
    2. Deletes all data but preserves the existing region structure and splits.
    3. Re-enables the table with the original schema and regions.
  • Effect on Regions: This command is more efficient for large tables because it maintains the pre-existing regions, which means the table retains its original split points. This can help avoid performance issues due to region splitting and merging that occur with a single-region table after truncation.

  • Command Syntax:

    truncate_preserve 'table_name'
    

Summary of Differences

Feature truncate truncate_preserve
Deletes all data Yes Yes
Preserves regions/splits No Yes
Recreates table schema Yes Yes
Use case Development/testing, small tables Large tables with pre-split regions

When to Use Each

  • Use truncate when you want a clean slate without worrying about the original region splits, such as in testing or development.
  • Use truncate_preserve in production for large tables that have been pre-split, especially if you want to avoid the overhead of re-splitting regions and maintaining the table’s load distribution.

댓글

이 블로그의 인기 게시물

Using the MinIO API via curl

How to split a list into chunks of 100 items in JavaScript, 자바스크립트 리스트 쪼개기

HTML Inline divisions at one row by Tailwind

Boilerplate for typescript server programing

가속도 & 속도

Gradle multi-module project

How to checkout branch of remote git, 깃 리모트 브랜치 체크아웃

CDPEvents in puppeteer

Sparse encoder

Reactjs datetime range picker