Database indexing beyond B-trees — selectivity, partial indexes, and write amplification

An index is a shortcut into a heap of rows. PostgreSQL-like systems default to B-tree indexes for equality and range scans because they keep keys ordered and tolerate high fan-out without degenerating into linked-list walks under typical OLTP cardinality.

The uncomfortable truth is that indexes are not free: every INSERT, UPDATE that touches indexed columns, and DELETE must update each relevant index leaf. Operators feel this as higher write latency, hotter disks, more WAL, longer autovacuum debt, and more pages to cache. Good teams index from measurable pain, not from schema nostalgia.

What the planner optimizes for

Roughly speaking, the planner estimates selectivity, ordering, join cardinality, available statistics, parallelism, and I/O budgets. A narrow index on a uniformly random UUID column scanning half the table rarely beats a sequential read of a modest table—it may still win on huge heaps, which is why “always index primary keys” and “sometimes skip indexes entirely” coexist as correct advice across different scales.

Interview-grade framing: indexing is negotiating between latency of one query family versus steady-state mutation cost.

Composite indexes and column order

For composite (a, b, c), the index can accelerate predicates that constrain a prefix of the key (depending on operators). Putting the most selective filtering column first is a heuristic, but real systems often lead with equality columns (country_id =) then range (created_at BETWEEN) following PostgreSQL cookbook guidance.

Anti-pattern teams love: stacking low-cardinality boolean flags first because “they appear in WHERE often.” Booleans explode neither selectivity nor sort usefulness—partial indexes beat “index everything including WHERE is_active."

Partial and expression indexes

Partial indexes materialize fewer rows (“only open tickets,” “only undeleted”). They shrink write amplification dramatically when the predicate matches how the application actually queries data.

Expression indexes stabilize planner choices for LOWER(email) or immutable date bucketing—but only when queries repeat the identical expression rather than drifting into client-side inconsistencies.

Both must stay aligned with migrations: renaming a predicate without renaming the concurrently created companion index silently starves optimizations.

Covering semantics and INCLUDE

PostgreSQL lets you INCLUDE non-key payloads so index-only scans can satisfy more projections without bouncing to the heap—when visibility map participation allows.

This is potent for read-heavy dashboards that repeatedly pull the same narrow columns (“status + due_at”). It increases index width, meaning fewer entries per page and more RAM pressure. Coverage is optimization, not a moral victory.

Write amplification checkpoints

Before adding “one more helpful index,” ask:

Question	Healthy answer
Which query regressed measurable P95 latency?	A saved `EXPLAIN (ANALYZE, BUFFERS)` snippet
How often does `INSERT`/bulk job run?	Back-of-envelope updates per minute
Could we reorganize workloads (materialized aggregates, partitioned tables)?	Documented spike windows
Is autovacuum keeping up?	Bloat dashboards, freeze age warnings

Skipping this audit causes the classic outage story: blazing reads in staging, creeping write stalls in prod Tuesday morning.

Operating habits that scale

Instrument statement timeouts for interactive roles, stash baseline plans beside releases, rotate someone through vacuum/heavy write rehearsals before Black Friday equivalents. Index changes should ship with reversible migrations—CREATE INDEX CONCURRENTLY on Postgres—or explicit maintenance windows where locks are understood.

Indexing is craftsmanship: every extra tree is debt with an interest coupon paid in microseconds at write time. Treat it accordingly.