Caches trade freshness certainty for cost or latency amortization. The failure mode is seldom “mis-hit ratio bad”—it reads as subtle revenue leaks (stale pricing), silent policy violations (premature ACL artifacts), intermittent bugs that disappear from logs before you correlate them.
Name your layers deliberately
| Layer | Typically owns | Primary invalidation knobs |
|---|---|---|
| Browser | Private assets | Cache-Control, ETag |
| CDN / Edge | Anonymous GET bodies | TTL, surrogate tags, purge APIs |
| Reverse proxy | Compression plus micro-cache | keyed routes, timeouts |
| App server | Aggregated computations | TTL plus events plus manual bust |
| Data store replicas | Reads behind leaders | replication lag allowances |
Skipping explicit ownership breeds cross-team friction: one team purges the proxy while another CDN still serves an old blob.
Stampede control
Thundering herds hit origin when hot keys expire together. Mitigations include probabilistic early refresh, locking or single-flight wrappers, jittered TTLs, and a thin layer of in-process memoization for bursty identical reads.
Raw hit-rate without context can lie. Cheap static assets tolerate low hit ratios; authoritative pricing reads do not.
Domain-aligned keys
Use version namespaces inside keys (pricing:v2026-03-01), include tenant identifiers, and avoid accidental cross-tenant reuse.
Prefer explicit event-driven eviction (publish “invalidate X” after writes) instead of guessing a global TTL and hoping the business agrees.
Observability
Measure origin offload, how often stale data is explicitly served under policy, and eviction latency—not vanity counters alone. Tie spikes to deploys; partial rollouts often break warming assumptions.
Document staleness budgets in runbooks—for example “search index may lag writes by up to 60s”—so incidents inherit memory instead of improvisation.
Caching is a product decision encoded in infrastructure. Treat it that way.