← Back to blog

PostgreSQL 18: BTree vs Hash Indexes — A Benchmark

PostgreSQLBenchmarkDatabasePerformance

I benchmarked PostgreSQL 18.4 with 100K rows per table, comparing BTree vs Hash indexes across three column types (char(1), varchar(10), varchar(1000)) and two data distributions: 3 distinct values (status codes) and random values (26–100K distinct). Every table has a char(100) padding column to simulate a realistic row width. All queries are warm-cache with 5 warmup + 30 timed iterations; p50 is reported.

Storage

Data Type Heap BTree Hash
Status (3 distinct) char(1) 14 MB 704 KB 6,592 KB
varchar(10) 14 MB 696 KB 6,248 KB
varchar(1000) 112 MB 712 KB 6,064 KB
Random (26–100K distinct) char(1) 14 MB 704 KB 5,984 KB
varchar(10) 14 MB 3,104 KB 4,680 KB
varchar(1000) 112 MB 156 MB 4,112 KB

With 3 distinct values, BTree deduplication (default since PG 13) stores each key once with a posting list. All BTree indexes are ~700 KB regardless of key width. Hash indexes can't deduplicate — they're 8–9x larger at 6 MB because every row gets its own bucket entry.

With random values the story flips for wide keys. The varchar(1000) BTree is 156 MB (larger than the heap), while the Hash index is only 4 MB — a 38x reduction. Hash stores just a 4-byte hash code per row regardless of key width.

Query Performance: WHERE status = ?

Data Type BTree (p50) Hash (p50) BTree scan Hash scan
Status (3 distinct) char(1) 4.15 ms 7.93 ms Index Only Seq Scan
varchar(10) 4.19 ms 6.41 ms Index Only Seq Scan
varchar(1000) 10.90 ms 21.88 ms Index Only Seq Scan
Random (26–100K distinct) char(1) 0.24 ms 1.30 ms Index Only Bitmap
varchar(10) 0.06 ms 0.05 ms Index Only Index Scan
varchar(1000) 0.06 ms 0.05 ms Index Only Index Scan

For low-cardinality status columns, the planner completely skips the Hash index and uses a Seq Scan — with 33K matching rows per value, 33K random heap lookups via the hash index would be slower than reading the 14 MB table sequentially. The deduplicated BTree (700 KB) enables Index Only Scans that never touch the heap.

For high-cardinality point lookups (finding 1 row), BTree and Hash are neck-and-neck at sub-millisecond. The varchar(1000) Hash index is 38x smaller (4 MB vs 156 MB), but since the entire BTree still fits in memory, traversal is fast in warm-cache tests.

WHERE status IN (...) and GROUP BY

Query Type BTree (p50) Hash (p50)
IN (2 values) char(1) 3.3 ms 10.89 ms
varchar(10) 3.36 ms 10.09 ms
GROUP BY char(1) 7.55 ms 14.09 ms
varchar(10) 8.22 ms 14.72 ms

IN queries and GROUP BY further disadvantage Hash on low-cardinality data. BTree indexes return rows in order and stay compact with deduplication. Hash indexes provide no ordering, and for these distributions the planner often falls back to Seq Scan-based plans.

Takeaways

  • Low-cardinality (status codes, enums): Always use BTree. Deduplication makes the index ~700 KB regardless of key width, enables Index Only Scans, and the planner won't use a Hash index here. char(1) saves heap space; the index size is the same.
  • High-cardinality wide keys (UUIDs, long strings): Hash indexes are dramatically smaller (4 MB vs 156 MB for varchar(1000)). For pure equality lookups they match BTree speed. But they don't support ordering, ranges, uniqueness, or Index Only Scans.
  • High-cardinality short keys: BTree and Hash are similar in size and speed. BTree wins on generality — use it unless you have a specific reason for Hash.
  • Never use Hash for status columns: the planner ignores it in favor of a Seq Scan, wasting 6 MB of disk and memory.

Methodology

  • PostgreSQL: 18.4 (Debian), Docker, shared_buffers=512MB, work_mem=64MB
  • Rows: 100,000 per table, 3 distinct (status) or 26/100K distinct (random)
  • Timing: 5 warmup iterations discarded, 30 recorded, p50 reported
  • Indexes: standard BTree (deduplication enabled) vs Hash, default fillfactor
  • Reproduce: bench.sh in the repository — fully automated

Have a project or want to discuss an idea?

Contact me
© 2026 Giacomo Rizzi
VAT ID: 05476830269
Via Antica Torre 31, San Polo di Piave, 31020 TV, Italy