PostgreSQL 18: BTree vs Hash Indexes — A Benchmark
I benchmarked PostgreSQL 18.4 with 100K rows per table, comparing BTree vs Hash indexes across three column types (char(1), varchar(10), varchar(1000)) and two data distributions: 3 distinct values (status codes) and random values (26–100K distinct). Every table has a char(100) padding column to simulate a realistic row width. All queries are warm-cache with 5 warmup + 30 timed iterations; p50 is reported.
Storage
| Data | Type | Heap | BTree | Hash |
|---|---|---|---|---|
| Status (3 distinct) | char(1) |
14 MB | 704 KB | 6,592 KB |
varchar(10) |
14 MB | 696 KB | 6,248 KB | |
varchar(1000) |
112 MB | 712 KB | 6,064 KB | |
| Random (26–100K distinct) | char(1) |
14 MB | 704 KB | 5,984 KB |
varchar(10) |
14 MB | 3,104 KB | 4,680 KB | |
varchar(1000) |
112 MB | 156 MB | 4,112 KB |
With 3 distinct values, BTree deduplication (default since PG 13) stores each key once with a posting list. All BTree indexes are ~700 KB regardless of key width. Hash indexes can't deduplicate — they're 8–9x larger at 6 MB because every row gets its own bucket entry.
With random values the story flips for wide keys. The varchar(1000) BTree is 156 MB (larger than the heap), while the Hash index is only 4 MB — a 38x reduction. Hash stores just a 4-byte hash code per row regardless of key width.
Query Performance: WHERE status = ?
| Data | Type | BTree (p50) | Hash (p50) | BTree scan | Hash scan |
|---|---|---|---|---|---|
| Status (3 distinct) | char(1) |
4.15 ms | 7.93 ms | Index Only | Seq Scan |
varchar(10) |
4.19 ms | 6.41 ms | Index Only | Seq Scan | |
varchar(1000) |
10.90 ms | 21.88 ms | Index Only | Seq Scan | |
| Random (26–100K distinct) | char(1) |
0.24 ms | 1.30 ms | Index Only | Bitmap |
varchar(10) |
0.06 ms | 0.05 ms | Index Only | Index Scan | |
varchar(1000) |
0.06 ms | 0.05 ms | Index Only | Index Scan |
For low-cardinality status columns, the planner completely skips the Hash index and uses a Seq Scan — with 33K matching rows per value, 33K random heap lookups via the hash index would be slower than reading the 14 MB table sequentially. The deduplicated BTree (700 KB) enables Index Only Scans that never touch the heap.
For high-cardinality point lookups (finding 1 row), BTree and Hash are neck-and-neck at sub-millisecond. The varchar(1000) Hash index is 38x smaller (4 MB vs 156 MB), but since the entire BTree still fits in memory, traversal is fast in warm-cache tests.
WHERE status IN (...) and GROUP BY
| Query | Type | BTree (p50) | Hash (p50) |
|---|---|---|---|
IN (2 values) |
char(1) |
3.3 ms | 10.89 ms |
varchar(10) |
3.36 ms | 10.09 ms | |
GROUP BY |
char(1) |
7.55 ms | 14.09 ms |
varchar(10) |
8.22 ms | 14.72 ms |
IN queries and GROUP BY further disadvantage Hash on low-cardinality data. BTree indexes return rows in order and stay compact with deduplication. Hash indexes provide no ordering, and for these distributions the planner often falls back to Seq Scan-based plans.
Takeaways
- Low-cardinality (status codes, enums): Always use BTree. Deduplication makes the index ~700 KB regardless of key width, enables Index Only Scans, and the planner won't use a Hash index here.
char(1)saves heap space; the index size is the same. - High-cardinality wide keys (UUIDs, long strings): Hash indexes are dramatically smaller (4 MB vs 156 MB for varchar(1000)). For pure equality lookups they match BTree speed. But they don't support ordering, ranges, uniqueness, or Index Only Scans.
- High-cardinality short keys: BTree and Hash are similar in size and speed. BTree wins on generality — use it unless you have a specific reason for Hash.
- Never use Hash for status columns: the planner ignores it in favor of a Seq Scan, wasting 6 MB of disk and memory.
Methodology
- PostgreSQL: 18.4 (Debian), Docker, shared_buffers=512MB, work_mem=64MB
- Rows: 100,000 per table, 3 distinct (status) or 26/100K distinct (random)
- Timing: 5 warmup iterations discarded, 30 recorded, p50 reported
- Indexes: standard BTree (deduplication enabled) vs Hash, default fillfactor
- Reproduce:
bench.shin the repository — fully automated