May 20, 2026

PostgreSQL 18: BTree vs Hash Indexes — A Benchmark

PostgreSQLBenchmarkDatabasePerformance

I benchmarked PostgreSQL 18.4 with 100K rows per table, comparing BTree vs Hash indexes across three column types (char(1), varchar(10), varchar(1000)) and two data distributions: 3 distinct values (status codes) and random values (26–100K distinct). Every table has a char(100) padding column to simulate a realistic row width. All queries are warm-cache with 5 warmup + 30 timed iterations; p50 is reported.

Storage

Data	Type	Heap	BTree	Hash
Status (3 distinct)	`char(1)`	14 MB	704 KB	6,592 KB
	`varchar(10)`	14 MB	696 KB	6,248 KB
	`varchar(1000)`	112 MB	712 KB	6,064 KB
Random (26–100K distinct)	`char(1)`	14 MB	704 KB	5,984 KB
	`varchar(10)`	14 MB	3,104 KB	4,680 KB
	`varchar(1000)`	112 MB	156 MB	4,112 KB

With 3 distinct values, BTree deduplication (default since PG 13) stores each key once with a posting list. All BTree indexes are ~700 KB regardless of key width. Hash indexes can't deduplicate — they're 8–9x larger at 6 MB because every row gets its own bucket entry.

With random values the story flips for wide keys. The varchar(1000) BTree is 156 MB (larger than the heap), while the Hash index is only 4 MB — a 38x reduction. Hash stores just a 4-byte hash code per row regardless of key width.

Query Performance: `WHERE status = ?`

Data	Type	BTree (p50)	Hash (p50)	BTree scan	Hash scan
Status (3 distinct)	`char(1)`	4.15 ms	7.93 ms	Index Only	Seq Scan
	`varchar(10)`	4.19 ms	6.41 ms	Index Only	Seq Scan
	`varchar(1000)`	10.90 ms	21.88 ms	Index Only	Seq Scan
Random (26–100K distinct)	`char(1)`	0.24 ms	1.30 ms	Index Only	Bitmap
	`varchar(10)`	0.06 ms	0.05 ms	Index Only	Index Scan
	`varchar(1000)`	0.06 ms	0.05 ms	Index Only	Index Scan

For low-cardinality status columns, the planner completely skips the Hash index and uses a Seq Scan — with 33K matching rows per value, 33K random heap lookups via the hash index would be slower than reading the 14 MB table sequentially. The deduplicated BTree (700 KB) enables Index Only Scans that never touch the heap.

For high-cardinality point lookups (finding 1 row), BTree and Hash are neck-and-neck at sub-millisecond. The varchar(1000) Hash index is 38x smaller (4 MB vs 156 MB), but since the entire BTree still fits in memory, traversal is fast in warm-cache tests.

`WHERE status IN (...)` and `GROUP BY`

Query	Type	BTree (p50)	Hash (p50)
`IN (2 values)`	`char(1)`	3.3 ms	10.89 ms
`IN (2 values)`	`varchar(10)`	3.36 ms	10.09 ms
`GROUP BY`	`char(1)`	7.55 ms	14.09 ms
`GROUP BY`	`varchar(10)`	8.22 ms	14.72 ms

IN queries and GROUP BY further disadvantage Hash on low-cardinality data. BTree indexes return rows in order and stay compact with deduplication. Hash indexes provide no ordering, and for these distributions the planner often falls back to Seq Scan-based plans.

Takeaways

Low-cardinality (status codes, enums): Always use BTree. Deduplication makes the index ~700 KB regardless of key width, enables Index Only Scans, and the planner won't use a Hash index here. char(1) saves heap space; the index size is the same.
High-cardinality wide keys (UUIDs, long strings): Hash indexes are dramatically smaller (4 MB vs 156 MB for varchar(1000)). For pure equality lookups they match BTree speed. But they don't support ordering, ranges, uniqueness, or Index Only Scans.
High-cardinality short keys: BTree and Hash are similar in size and speed. BTree wins on generality — use it unless you have a specific reason for Hash.
Never use Hash for status columns: the planner ignores it in favor of a Seq Scan, wasting 6 MB of disk and memory.

Methodology

PostgreSQL: 18.4 (Debian), Docker, shared_buffers=512MB, work_mem=64MB
Rows: 100,000 per table, 3 distinct (status) or 26/100K distinct (random)
Timing: 5 warmup iterations discarded, 30 recorded, p50 reported
Indexes: standard BTree (deduplication enabled) vs Hash, default fillfactor
Reproduce: bench.sh in the repository — fully automated

Storage

Query Performance: WHERE status = ?

WHERE status IN (...) and GROUP BY

Takeaways

Methodology

Have a project or want to discuss an idea?

Query Performance: `WHERE status = ?`

`WHERE status IN (...)` and `GROUP BY`