Caching Strategy

Problem

Every data read that bypasses a cache hits the origin — a database, remote API, or expensive computation. At scale, the origin cannot serve all reads directly:

Without cache:
  1,000 req/s Ɨ 10ms DB latency = 10,000ms of DB time/second
  = 10 DB connections fully saturated just for reads

With 95% cache hit rate:
  1,000 req/s Ɨ 5% miss rate = 50 cache misses/s hitting DB
  = 0.5 DB connections needed for reads
  = 20Ɨ reduction in DB load

The problem is not just "add a cache and it works." The hard problems are: cache invalidation, cold starts, thundering herd on misses, and choosing the right placement strategy.

---

Why It Matters (Latency, Throughput, Cost)

The fundamental cache equation:

effective_latency = hit_rate Ɨ cache_latency + (1 - hit_rate) Ɨ origin_latency

Concrete example with Redis (0.5ms) and PostgreSQL (10ms):

hit_rate=0.50: 0.50Ɨ0.5ms + 0.50Ɨ10ms = 0.25 + 5.00 = 5.25ms
hit_rate=0.80: 0.80Ɨ0.5ms + 0.20Ɨ10ms = 0.40 + 2.00 = 2.40ms
hit_rate=0.95: 0.95Ɨ0.5ms + 0.05Ɨ10ms = 0.475 + 0.50 = 0.975ms
hit_rate=0.99: 0.99Ɨ0.5ms + 0.01Ɨ10ms = 0.495 + 0.10 = 0.595ms

Going from 80% to 99% hit rate: 4Ɨ latency reduction (2.4ms → 0.6ms).

The cache hit rate is everything. A 50% hit rate barely helps. Target ≄90%.

Cost:

  • Redis on AWS ElastiCache: ~$0.025/GB-hour
  • RDS PostgreSQL: ~$0.25/GB-hour
  • Read replica: $0.125/GB-hour

Serving reads from cache is 10Ɨ cheaper than read replicas.

---

Mental Model

Think of the memory hierarchy:

Level          Latency    Capacity    Cost/GB
──────────────────────────────────────────────
CPU L1 cache    4 cycles     32 KB    $10,000+
CPU L2 cache   12 cycles    256 KB    $10,000+
CPU L3 cache   40 cycles      8 MB    $10,000+
RAM           200 cycles    32-512GB  $5-20
NVMe SSD     10,000 cycles  1-8 TB    $0.10-0.50
Network HDD 100,000 cycles  many TB   $0.02-0.10
Remote DB    500,000 cycles  ---       varies
────────────────────────────────────────────────────────────
Application cache (Redis):   500 cycles over network ā‰ˆ 0.5ms

Caching moves data "up" the hierarchy. Redis at 0.5ms is to PostgreSQL at 10ms what L1 cache is to RAM.

---

Underlying Theory (OS / CN / DSA / Math Linkage)

Zipf Distribution — Why Caches Work

Real-world access patterns follow a Zipf (power law) distribution: the most popular item is accessed proportionally more than the second most popular, which is more than the third, etc.

P(rank=k) āˆ 1/k^s   (s ā‰ˆ 1 for web workloads)

Rank 1 item:   30% of all accesses
Rank 2 item:   15% of all accesses
Rank 3 item:   10% of all accesses
Top 20% of items: ~80% of all accesses  ← Pareto principle

If you cache only the top 20% of your item catalog, you achieve ~80% hit rate. This is why a small cache (10% of data size) can yield excellent hit rates.

LRU Algorithm: Doubly Linked List + Hash Map

The Least Recently Used (LRU) eviction algorithm in O(1) time:

Data structure:
  hash_map: key → node  (for O(1) lookup)
  doubly_linked_list:   (for O(1) move-to-front and evict-from-tail)

ā”Œā”€ā”€ā”€ā”€ā”€ā”€ā”   ā”Œā”€ā”€ā”€ā”€ā”€ā”€ā”   ā”Œā”€ā”€ā”€ā”€ā”€ā”€ā”   ā”Œā”€ā”€ā”€ā”€ā”€ā”€ā”
│ MRU  │◄──│  B   │◄──│  C   │◄──│ LRU  │
│ head │──►│      │──►│      │──►│ tail │
ā””ā”€ā”€ā”€ā”€ā”€ā”€ā”˜   ā””ā”€ā”€ā”€ā”€ā”€ā”€ā”˜   ā””ā”€ā”€ā”€ā”€ā”€ā”€ā”˜   ā””ā”€ā”€ā”€ā”€ā”€ā”€ā”˜
  (most recently used)             (evict next)

get(key):
  1. Look up node in hash_map → O(1)
  2. Move node to head of list → O(1)   (update 4 pointers)
  3. Return value

put(key, value):
  1. If key exists: update value, move to head → O(1)
  2. If cache full: remove tail from list AND hash_map → O(1)
  3. Insert new node at head, add to hash_map → O(1)

LFU vs LRU

  • LRU (Least Recently Used): evicts the item not accessed for the longest time. Simple, works well for temporal locality.
  • LFU (Least Frequently Used): evicts the item with the fewest total accesses. Better for Zipf workloads but complex (requires frequency counter + min-heap or bucket structure).
  • Redis default: LRU (approximated with 5-sample eviction for performance)

ARC (Adaptive Replacement Cache): self-tuning between LRU and LFU. Used in ZFS, Solaris.

---

Cache Placement Strategies

1. Cache-Aside (Lazy Loading) — Most Common

async def get_user(user_id: int) -> dict:
    cache_key = f"user:{user_id}"

    # 1. Check cache first
    cached = await redis.get(cache_key)
    if cached:
        return json.loads(cached)                   # cache hit

    # 2. Cache miss — fetch from DB
    user = await db.fetchrow("SELECT * FROM users WHERE id = $1", user_id)
    if user is None:
        return None

    # 3. Populate cache
    await redis.setex(cache_key, 300, json.dumps(dict(user)))  # TTL: 5 min
    return dict(user)                                # cache miss + populate

Pros: Simple. Only caches what's actually requested. No startup overhead.

Cons: Cache miss adds DB latency. First request after cache cold is slow. Race condition possible (two requests both miss and both write).

2. Read-Through

The cache layer itself fetches from the origin on a miss. The application only talks to the cache.

class ReadThroughCache:
    def __init__(self, redis, db, ttl=300):
        self._redis = redis
        self._db = db
        self._ttl = ttl

    async def get(self, key: str, loader_fn):
        value = await self._redis.get(key)
        if value:
            return json.loads(value)
        # Cache handles the load
        value = await loader_fn()
        if value is not None:
            await self._redis.setex(key, self._ttl, json.dumps(value))
        return value

# Usage:
cache = ReadThroughCache(redis, db)
user = await cache.get(f"user:{user_id}", lambda: db.get_user(user_id))

3. Write-Through

Every write goes to both cache and DB synchronously. Cache is always consistent.

async def update_user(user_id: int, updates: dict) -> dict:
    # Write to DB first (source of truth)
    user = await db.execute(
        "UPDATE users SET name=$1 WHERE id=$2 RETURNING *",
        updates["name"], user_id
    )

    # Write to cache immediately (write-through)
    await redis.setex(f"user:{user_id}", 300, json.dumps(dict(user)))
    return dict(user)

Pros: Cache always consistent after write. No stale reads.

Cons: Every write pays cache write cost. Cache fills with data that may never be read.

4. Write-Behind (Write-Back)

Writes go to cache immediately, then asynchronously flush to DB. Highest write throughput.

import asyncio
from collections import defaultdict

class WriteBehindCache:
    def __init__(self, redis, db, flush_interval=1.0):
        self._redis = redis
        self._db = db
        self._dirty: dict[str, dict] = {}  # pending writes
        self._lock = asyncio.Lock()
        asyncio.create_task(self._flush_loop(flush_interval))

    async def set(self, key: str, value: dict):
        await self._redis.set(key, json.dumps(value))
        async with self._lock:
            self._dirty[key] = value  # queue for async flush

    async def _flush_loop(self, interval: float):
        while True:
            await asyncio.sleep(interval)
            async with self._lock:
                pending = dict(self._dirty)
                self._dirty.clear()
            if pending:
                async with self._db.transaction():
                    for key, value in pending.items():
                        await self._db.upsert("users", value)

Pros: Writes are fast (in-memory only). DB batches many writes.

Cons: Data loss if cache crashes before flush. Complex. Risk of inconsistency.

---

Thundering Herd and Cache Stampede

When a popular key expires, many concurrent requests all see a cache miss simultaneously. All fire DB queries at once — thundering herd / cache stampede.

Solution 1: Mutex / Single-Flight

import asyncio

_locks: dict[str, asyncio.Lock] = {}

async def get_with_lock(key: str, loader_fn, ttl=300):
    cached = await redis.get(key)
    if cached:
        return json.loads(cached)

    # Ensure only one coroutine fetches at a time
    if key not in _locks:
        _locks[key] = asyncio.Lock()

    async with _locks[key]:
        # Double-check after acquiring lock (another coroutine may have loaded)
        cached = await redis.get(key)
        if cached:
            return json.loads(cached)

        value = await loader_fn()
        await redis.setex(key, ttl, json.dumps(value))
        return value

Solution 2: Probabilistic Early Revalidation (PER)

Refresh the cache slightly before expiry, based on probability that scales as TTL approaches zero:

import math
import random
import time

async def get_with_per(key: str, loader_fn, ttl=300, beta=1.0):
    """
    Probabilistic Early Revalidation (PER) algorithm.
    Fetch is triggered before expiry with increasing probability.
    beta controls aggressiveness (higher = earlier revalidation).
    """
    entry = await redis.get(f"{key}:per")  # stores {value, computed_at, ttl}
    if entry:
        data = json.loads(entry)
        value = data["value"]
        computed_at = data["computed_at"]
        stored_ttl = data["ttl"]
        elapsed = time.time() - computed_at
        remaining = stored_ttl - elapsed

        # Decide whether to recompute based on probability
        # P(recompute) increases as remaining TTL decreases
        gap = stored_ttl - remaining  # time since creation
        if remaining <= 0 or (-beta * math.log(random.random()) >= remaining):
            # Recompute (in background, return cached value now)
            asyncio.create_task(_refresh(key, loader_fn, ttl))
        return value

    # Cache miss — must fetch
    value = await loader_fn()
    entry = {"value": value, "computed_at": time.time(), "ttl": ttl}
    await redis.setex(f"{key}:per", ttl + 60, json.dumps(entry))
    return value

async def _refresh(key, loader_fn, ttl):
    value = await loader_fn()
    entry = {"value": value, "computed_at": time.time(), "ttl": ttl}
    await redis.setex(f"{key}:per", ttl + 60, json.dumps(entry))

Solution 3: Bloom Filter for Negative Caching

Cache misses for non-existent keys are expensive. A Bloom filter provides O(1) space-efficient check for "definitely not in DB":

from pybloom_live import BloomFilter

# Bloom filter: compact set membership check
# False positive rate: ~1% with 10 bits per element
user_bloom = BloomFilter(capacity=1_000_000, error_rate=0.01)

# On DB load, add all existing IDs to bloom filter
for user_id in db.execute("SELECT id FROM users"):
    user_bloom.add(user_id)

async def get_user_with_bloom(user_id: int):
    # Quickly reject lookups for non-existent users (avoids DB miss)
    if user_id not in user_bloom:
        return None  # Definitely doesn't exist — no DB query needed

    cached = await redis.get(f"user:{user_id}")
    if cached:
        return json.loads(cached)

    user = await db.fetchrow("SELECT * FROM users WHERE id = $1", user_id)
    if user:
        await redis.setex(f"user:{user_id}", 300, json.dumps(dict(user)))
    return dict(user) if user else None

---

LRU Implementation (from scratch)

from collections import OrderedDict
from threading import Lock

class LRUCache:
    """
    Thread-safe LRU cache backed by OrderedDict.
    OrderedDict in Python 3.7+ is a doubly linked list + dict.
    """

    def __init__(self, capacity: int):
        self.capacity = capacity
        self._cache: OrderedDict[str, any] = OrderedDict()
        self._lock = Lock()

    def get(self, key: str):
        with self._lock:
            if key not in self._cache:
                return None
            self._cache.move_to_end(key)  # mark as recently used
            return self._cache[key]

    def put(self, key: str, value) -> None:
        with self._lock:
            if key in self._cache:
                self._cache.move_to_end(key)
                self._cache[key] = value
            else:
                self._cache[key] = value
                if len(self._cache) > self.capacity:
                    self._cache.popitem(last=False)  # evict LRU (first item)

    @property
    def stats(self) -> dict:
        with self._lock:
            return {"size": len(self._cache), "capacity": self.capacity}

---

Cache Invalidation Strategies

1. TTL-based (simplest)

await redis.setex(f"user:{user_id}", 300, json.dumps(user))  # expires in 5 minutes

Consistency window: up to TTL seconds.

Best for: semi-static data (user profiles, product catalog).

Avoid for: financial data, inventory counts, anything requiring strong consistency.

2. Event-driven invalidation

# On DB write, publish an invalidation event
async def update_user(user_id: int, updates: dict):
    user = await db.update_user(user_id, updates)

    # Invalidate cache immediately
    await redis.delete(f"user:{user_id}")

    # Publish event for other cache layers (CDN, other services)
    await pubsub.publish("cache.invalidate", {"entity": "user", "id": user_id})
    return user

# Other services listen and invalidate their local caches
async def on_cache_invalidate(message):
    entity = message["entity"]
    entity_id = message["id"]
    await local_cache.delete(f"{entity}:{entity_id}")

3. Version-based (cache tags)

async def get_user_posts(user_id: int):
    # Key includes a version number stored separately
    version = await redis.get(f"user:{user_id}:version") or "0"
    cache_key = f"user:{user_id}:posts:v{version}"

    cached = await redis.get(cache_key)
    if cached:
        return json.loads(cached)

    posts = await db.fetchall("SELECT * FROM posts WHERE user_id = $1", user_id)
    await redis.setex(cache_key, 3600, json.dumps(posts))
    return posts

async def invalidate_user_posts(user_id: int):
    # Increment version — old keys automatically become orphaned
    await redis.incr(f"user:{user_id}:version")
    # Old cache entries expire via TTL naturally

---

Complexity Analysis

OperationLRU CacheRedis Network CacheDB query
Cache hitO(1) time, L1/RAM latencyO(1) time, ~0.5ms
Cache miss + loadO(1) + O(query)O(1) + O(query)
Cache evictionO(1)O(1)
Invalidation (TTL)O(1)O(1)
Invalidation (scan)O(N)O(N) SCAN

Space complexity: O(capacity) for LRU cache.

---

Benchmark (p50, p99, CPU, Memory)

Setup: 10,000 req/s, Redis 7 (single node), PostgreSQL 15, Redis on same host (0.3ms RTT).

ā”Œā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”¬ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”¬ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”¬ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”¬ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”
│ Hit Rate / Config  │  p50   │  p99   │ DB Calls/s│ Redis Mem/GB  │
ā”œā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”¼ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”¼ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”¼ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”¼ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”¤
│ No cache           │  8ms   │ 22ms   │ 10,000   │ 0             │
│ 50% hit rate       │  4ms   │ 14ms   │  5,000   │ 0.5           │
│ 80% hit rate       │  2ms   │  7ms   │  2,000   │ 1.0           │
│ 95% hit rate       │ 0.8ms  │  2ms   │    500   │ 2.0           │
│ 99% hit rate       │ 0.5ms  │  1ms   │    100   │ 4.0           │
ā””ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”“ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”“ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”“ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”“ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”˜

CPU: Redis uses ~1 CPU core at 100k ops/sec.
     PostgreSQL CPU drops proportionally with hit rate.

---

Observability

from prometheus_client import Counter, Histogram, Gauge

cache_ops = Counter('cache_operations_total', 'Cache ops', ['operation', 'result'])
# result: hit, miss, error, eviction

cache_latency = Histogram('cache_latency_seconds', 'Cache operation latency',
    ['operation'], buckets=[.0001, .0005, .001, .005, .010, .025, .100])

cache_memory = Gauge('cache_memory_bytes', 'Cache memory usage')

def track_cache(operation: str):
    def decorator(fn):
        async def wrapper(*args, **kwargs):
            start = time.monotonic()
            try:
                result = await fn(*args, **kwargs)
                if result is None:
                    cache_ops.labels(operation, 'miss').inc()
                else:
                    cache_ops.labels(operation, 'hit').inc()
                return result
            except Exception as e:
                cache_ops.labels(operation, 'error').inc()
                raise
            finally:
                cache_latency.labels(operation).observe(time.monotonic() - start)
        return wrapper
    return decorator

# Derived metrics to alert on:
# hit_rate = rate(cache_ops{result="hit"}[5m]) / rate(cache_ops_total[5m])
# ALERT: hit_rate < 0.80 for 5 minutes
# ALERT: cache_memory_bytes / cache_max_bytes > 0.90 (approaching eviction pressure)

Redis built-in stats

redis-cli INFO stats | grep -E "keyspace_hits|keyspace_misses|evicted_keys|used_memory"

# keyspace_hits:   10450921
# keyspace_misses:  524051     → hit_rate = 10450921 / (10450921 + 524051) = 95.2%
# evicted_keys:     1203       → eviction happening — consider increasing maxmemory
# used_memory_human: 2.50G

---

Multi-language Implementation

Python — Redis with connection pool

import redis.asyncio as aioredis
import json
from functools import wraps

redis_pool = aioredis.ConnectionPool.from_url(
    "redis://localhost:6379",
    max_connections=20,
    decode_responses=True
)
redis_client = aioredis.Redis(connection_pool=redis_pool)

def cached(key_fn, ttl=300):
    """Decorator for cache-aside pattern."""
    def decorator(fn):
        @wraps(fn)
        async def wrapper(*args, **kwargs):
            key = key_fn(*args, **kwargs)
            hit = await redis_client.get(key)
            if hit:
                return json.loads(hit)
            result = await fn(*args, **kwargs)
            if result is not None:
                await redis_client.setex(key, ttl, json.dumps(result))
            return result
        return wrapper
    return decorator

@cached(key_fn=lambda user_id: f"user:{user_id}", ttl=300)
async def get_user(user_id: int) -> dict:
    return await db.fetchrow("SELECT * FROM users WHERE id = $1", user_id)

Go — Redis with go-redis

package cache

import (
    "context"
    "encoding/json"
    "time"

    "github.com/redis/go-redis/v9"
)

type Cache struct {
    client *redis.Client
}

func New(addr string) *Cache {
    return &Cache{
        client: redis.NewClient(&redis.Options{
            Addr:         addr,
            PoolSize:     20,
            MinIdleConns: 5,
            DialTimeout:  5 * time.Second,
            ReadTimeout:  3 * time.Second,
            WriteTimeout: 3 * time.Second,
        }),
    }
}

func (c *Cache) GetOrLoad(ctx context.Context, key string, loader func() (any, error), ttl time.Duration) (any, error) {
    val, err := c.client.Get(ctx, key).Result()
    if err == nil {
        var result any
        json.Unmarshal([]byte(val), &result)
        return result, nil  // cache hit
    }
    if err != redis.Nil {
        return nil, err  // redis error
    }

    // Cache miss — load from source
    value, err := loader()
    if err != nil {
        return nil, err
    }

    data, _ := json.Marshal(value)
    c.client.SetEx(ctx, key, string(data), ttl)
    return value, nil
}

Node.js — ioredis

const Redis = require('ioredis');

const redis = new Redis({
    host: 'localhost',
    port: 6379,
    maxRetriesPerRequest: 3,
    retryStrategy: (times) => Math.min(times * 50, 2000),
    lazyConnect: false,
});

class Cache {
    constructor(redis, defaultTTL = 300) {
        this.redis = redis;
        this.defaultTTL = defaultTTL;
    }

    async getOrLoad(key, loader, ttl = this.defaultTTL) {
        const cached = await this.redis.get(key);
        if (cached !== null) {
            return JSON.parse(cached);  // cache hit
        }

        const value = await loader();
        if (value !== null && value !== undefined) {
            await this.redis.setex(key, ttl, JSON.stringify(value));
        }
        return value;
    }

    async invalidate(key) {
        await this.redis.del(key);
    }

    async invalidatePattern(pattern) {
        // Use SCAN to avoid blocking Redis with KEYS
        const pipeline = this.redis.pipeline();
        let cursor = '0';
        do {
            const [nextCursor, keys] = await this.redis.scan(cursor, 'MATCH', pattern, 'COUNT', 100);
            cursor = nextCursor;
            keys.forEach(key => pipeline.del(key));
        } while (cursor !== '0');
        await pipeline.exec();
    }
}

const cache = new Cache(redis);
const getUser = (userId) =>
    cache.getOrLoad(`user:${userId}`, () => db.query('SELECT * FROM users WHERE id = $1', [userId]));

---

Trade-offs

StrategyConsistencyComplexityWrite CostBest For
Cache-AsideEventual (TTL)LowNone
Read-ThroughEventual (TTL)MediumNone
Write-ThroughStrongMedium2Ɨ write
Write-BehindEventualHighNear-zero

Cache size vs hit rate trade-off:

Working set size estimate (for Zipf with s=1):
  Cache 10% of items → ~63% hit rate
  Cache 20% of items → ~74% hit rate
  Cache 50% of items → ~88% hit rate

Diminishing returns: doubling cache size increases hit rate by decreasing amounts.

---

Failure Modes

1. Cold start / cache stampede:

After deploy, cache is empty. All requests hit DB simultaneously. DB CPU spikes, queries slow down, connections exhaust. Write-through or pre-warming at deploy time prevents this.

async def warm_cache():
    """Pre-populate cache with hot data before accepting traffic."""
    hot_user_ids = await db.fetchall(
        "SELECT id FROM users ORDER BY last_active DESC LIMIT 10000"
    )
    for user_id in hot_user_ids:
        user = await db.fetchrow("SELECT * FROM users WHERE id = $1", user_id)
        await redis.setex(f"user:{user_id}", 3600, json.dumps(dict(user)))

2. Inconsistency window:

TTL-based caching means stale reads for TTL seconds after a write. For financial data (balances, inventory), stale reads are dangerous.

Mitigation: Event-driven invalidation. On write, immediately DEL the key.

3. Memory pressure and eviction:

When Redis hits maxmemory, it begins evicting keys. If your eviction policy is allkeys-lru, hot keys may be evicted. If noeviction, new writes fail.

# Set eviction policy
redis-cli CONFIG SET maxmemory-policy allkeys-lru
redis-cli CONFIG SET maxmemory 2gb

4. Cache key collision:

Two different data types sharing a key namespace causes data corruption.

# BAD: user:1 could be either a User or a UserPreference
cache.set("user:1", user_data)
cache.set("user:1", user_prefs)  # overwrites!

# GOOD: namespace by entity type
cache.set("user:profile:1", user_data)
cache.set("user:prefs:1", user_prefs)

5. Serialization mismatch:

Storing JSON-serialized objects and deserializing into the wrong type causes silent data bugs.

---

When NOT to Use

Strongly consistent data:

Bank balances, inventory counts, distributed locks — these require the DB as the single source of truth. Caching introduces inconsistency windows that cause incorrect reads.

Highly write-heavy data:

If a key is written 100Ɨ per second and read 10Ɨ per second, caching increases write overhead with near-zero hit rate. Cache read-heavy paths only.

Large objects:

Caching 100MB objects in Redis wastes memory. Consider Content-Addressable Storage (S3/GCS) with a CDN for large blobs.

Personal/sensitive data at rest:

Caching PII in Redis adds attack surface. Ensure Redis has auth, TLS, and network isolation. Consider not caching sensitive fields.

---

Lab

The lab for this module is embedded in the benchmarks directory:

../../benchmarks/06-cache-vs-no-cache/README.md

It demonstrates the hit rate vs latency trade-off with a simulated Zipf workload.

---

Key Takeaways

  1. The cache equation: effective_latency = hit_rate Ɨ cache_latency + (1 - hit_rate) Ɨ origin_latency. Hit rate is the dominant variable.
  2. Zipf distribution makes caches work: 20% of items get 80% of reads. Cache that 20%.
  3. LRU is O(1) for both get and put using doubly linked list + hash map. Redis approximates it with 5-sample random eviction.
  4. Cache placement: Cache-Aside for most cases. Write-Through for write+read hot data. Write-Behind for extreme write throughput.
  5. Thundering herd on expiry is a real production failure mode. Use single-flight mutex or probabilistic early revalidation (PER).
  6. Invalidation is hard. TTL is simple but eventually consistent. Event-driven is complex but correct.
  7. Bloom filters eliminate DB queries for non-existent keys — essential for attack mitigation (random key probing).
  8. Observe: track hit rate, eviction rate, memory usage. Alert on hit_rate < 80% and memory > 90%.

šŸ“š Related Topics