Preventing Cache Stampede: Protect Your Databases

Imagine this:

Your high-traffic app is humming along smoothly. Most data requests are being served instantly from Redis or other, blazing-fast cache layer. But then — a cache key expires, and within milliseconds, 10,000 clients hit your backend at once trying to fetch the same data from the database.

Boom — your production database is slammed, response times skyrocket, and the system becomes unresponsive.

You’ve just been trampled by a cache stampede.

What is a Cache Stampede?

A cache stampede (also called a cache miss storm) occurs when:

A popular cache key expires
Many clients attempt to read the same key
All of them get a cache miss
All of them bypass the cache simultaneously
They hit the backend/database at once
Overload happens — especially in high-concurrency environments

Even if your cache hit rate is 99%, a stampede on 1% of traffic can collapse the system.

Why Cache Stampedes Happen

Caches are typically built with a cache-aside pattern, where:

You check the cache first
If the key is missing, you recompute or fetch from the DB
Then store it back into the cache

This works great for individual requests… but under heavy concurrency, when many requests miss the same key, all of them go through this pattern at the same time:

def get_data(key):
    data = redis.get(key)
    if not data:
        data = query_postgres()
        redis.set(key, data, ex=60)
    return data

Without protection, this causes 1000s of concurrent query_postgres() calls.

Real-world Impact

A cache stampede causes:

Sudden spikes in DB traffic (often 10x to 100x)
Connection pool exhaustions.
Slow queries, timeouts, or even crashes
Denial of service for downstream services
Resource contention across your stack

When Does This Happen?

After a cache TTL expires
After a cache eviction (due to memory pressure)
During cold starts or deployment rollouts
When there’s only one shared cache key for a popular item (e.g., homepage data)

How to Mitigate Cache Stampedes

There are few ways to mitigate Cache Stampedes.

1. Request Coalescing / Single-flight Pattern

Let only one request rebuild the cache, while others wait for it to finish.

Concept:

First request takes a lock and fetches the data from DB
Others wait for that request to populate the cache

2. Add Jitter (Randomized Expiry)

Avoid simultaneous cache expiry by randomizing TTLs.

ttl = random.randint(50, 70)
redis.set(key, data, ex=ttl)

Prevents a "herd" of keys expiring together
Works best in high-concurrency apps with shared TTLs

3. Serve Stale Data While Rebuilding

Don’t block the user when the cache expires.

Instead:

Return the stale value
Trigger a background refresh

This is known as “stale-while-revalidate”.

Implementation Approach:

Store TTL metadata separately
If TTL is expired, serve stale data and refresh in background thread

if redis.ttl(key) <= 0:
    async_refresh(key)
return redis.get(key)

4. Proactive Cache Warming / Preload

Use background jobs to preload popular cache keys before they expire.

Example:

hot_keys = ['homepage', 'top_products', 'user_analytics']

for key in hot_keys:
    data = query_postgres()
    redis.set(key, data, ex=60)

Keeps your most important data hot
Ideal for dashboards, trending items, etc.

Use schedulers like cron, Celery beat, Sidekiq, etc.

5. Use Multi-Level Cache (L1 + L2)

Layered cache architecture:

L1 cache: In-process or in-memory (e.g. Python LRUCache)
L2 cache: Redis / Memcached
L3: Database

Each layer reduces pressure on the next.

6. Batch Writes (if cache miss triggers DB writes)

If the cache miss causes multiple writes, use queues like Kafka or Redis Streams to batch and smooth load.

PostgreSQL Tips for Surviving Stampedes

If it still happens:

Use PgBouncer: Reduces connection overhead
Add read replicas: Distribute SELECT load
Materialized Views: Precompute expensive queries
Analyze slow queries: Use EXPLAIN ANALYZE + indexes
Use rate-limiting middleware: Prevent DoS

Conclusion

Cache stampedes are easy to miss in staging, but devastating in production. You don’t need a DDoS to bring down your system — just a single cache key expiring under high load.

The fix isn’t just “increase cache TTL” — it’s about smart architecture:

Let only one request rebuild
Serve stale when you can
Add randomness to TTL
Use background warming.

Understanding and Solving Cache Stampede: The Invisible Threat to Databases

What is a Cache Stampede?

Why Cache Stampedes Happen

Real-world Impact

When Does This Happen?

How to Mitigate Cache Stampedes

1. Request Coalescing / Single-flight Pattern

2. Add Jitter (Randomized Expiry)

3. Serve Stale Data While Rebuilding

4. Proactive Cache Warming / Preload

5. Use Multi-Level Cache (L1 + L2)

6. Batch Writes (if cache miss triggers DB writes)

PostgreSQL Tips for Surviving Stampedes

Conclusion

Comments

More from this blog

How do you write efficient pagination for millions of rows?

PostgreSQL Parameter Tuning for 100M+ Rows

Create a Semantic Search API with FastAPI, Sentence-BERT, and PostgreSQL pgvector

PostgreSQL Indexing: When BRIN Is a Better Choice Than B-Tree

Command Palette

What is a Cache Stampede?

Why Cache Stampedes Happen

Real-world Impact

When Does This Happen?

How to Mitigate Cache Stampedes

1. Request Coalescing / Single-flight Pattern

2. Add Jitter (Randomized Expiry)

3. Serve Stale Data While Rebuilding

4. Proactive Cache Warming / Preload

5. Use Multi-Level Cache (L1 + L2)

6. Batch Writes (if cache miss triggers DB writes)

PostgreSQL Tips for Surviving Stampedes

Conclusion

Comments

More from this blog