Redis Distributed Systems Primer

ZSET patterns for timers, queues, rate limiters, and more. One data structure — a sorted set — solves most distributed coordination problems once you know what the score means.

1 · The big idea 2 · ZSET internals 3 · Core commands 4 · Pattern: timer / scheduler 5 · Pattern: priority queue 6 · Pattern: rate limiter 7 · Atomicity = your lock 8 · Durability 9 · Scaling 10 · Interview cheat sheet

1 · The big idea

Redis ZSET (sorted set) solves 90% of distributed coordination problems with one data structure. The whole trick is understanding what the score represents — change the meaning of the score and the same primitive becomes a timer, a queue, a rate limiter, or a leaderboard.

Timer / Scheduler

score = fire_timestamp

ZRANGEBYSCORE 0 NOW

Priority Queue

score = priority (inverted)

ZPOPMIN

Rate Limiter

score = request_timestamp

ZCOUNT key (now-window) now

Leaderboard

score = points

ZREVRANGE 0 9

Delayed Queue

score = visible_at

ZRANGEBYSCORE 0 NOW

Sliding Window

score = event_timestamp

ZREMRANGEBYSCORE 0 (now-window)

Interview insight

These "different" problems are the same data structure with different semantics. Recognizing this pattern shows you see underlying structure, not just memorized solutions.

2 · ZSET internals — why a sorted set, not a heap

A ZSET is a skip list + hash table. The dual structure gives you the best of both worlds: the skip list keeps members ordered by score for ranges and ranks, the hash table gives O(1) score lookup by member.

Operation	Heap	ZSET
Get min/max	O(1)	O(1)
Insert	O(log n)	O(log n)
Extract min	O(log n)	O(log n)
Range query (score 100–200)	O(n)	O(log n + m)
Get rank of X	O(n)	O(log n)
Get score of X	O(n)	O(1)
Update score	O(n) find + O(log n)	O(log n)

O(log n) search like a balanced tree · range scans just walk the base list forward (cache friendly) · simpler than a red-black tree and concurrency-friendly

3 · Core commands

ZADD — add member with score

ZADD timers 1708700000 "job:abc123" key score member (fire_at) (timer_id)

Stores the member with its score. If the member already exists, the score is updated in place.

ZRANGEBYSCORE — get members in a score range

ZRANGEBYSCORE timers 0 1708700000 LIMIT 0 100 key min max limit # Returns all members with score between 0 and now # Perfect for "give me all due timers"

ZREM — remove member (atomic)

ZREM timers "job:abc123" key member (NOT the score!) # Returns: 1 if removed, 0 if it didn't exist # This return value is your distributed lock!

Key insight

ZREM is atomic because Redis is single-threaded. Two workers calling ZREM on the same member — only one gets 1. No distributed lock needed.

ZPOPMIN — atomic pop (get + remove)

ZPOPMIN priority_queue 1 key count # Returns AND removes the lowest-scored member # Atomic — perfect for priority queue consumers

Other useful commands

Command	Purpose
`ZSCORE key member`	Get score of member — O(1)
`ZRANK key member`	Get rank (position) of member
`ZCOUNT key min max`	Count members in score range
`ZREVRANGE key 0 9`	Top 10 (highest scores first)
`ZREMRANGEBYSCORE key min max`	Remove all in score range
`ZINCRBY key increment member`	Atomically increment score

4 · Pattern: timer / scheduler — score = fire_at

The problem

Schedule callbacks to fire at specific times. Millions of timers, distributed workers, at-least-once delivery.

# Producer: schedule a timer timer_id = f"retention:{evidence_id}" fire_at = time.time() + (7 * 365 * 24 * 3600) # 7 years redis.ZADD("timers", {timer_id: fire_at}) # Store payload separately redis.HSET("timer_payloads", timer_id, json.dumps({ "action": "delete", "evidence_id": evidence_id }))

# Consumer: poll and process while True: now = time.time() # Get due timers (peek, don't remove yet) due = redis.ZRANGEBYSCORE("timers", 0, now, limit=100) for timer_id in due: # Atomic claim — only one worker wins if redis.ZREM("timers", timer_id) == 1: payload = redis.HGET("timer_payloads", timer_id) execute(payload) redis.HDEL("timer_payloads", timer_id) # else: another worker claimed it sleep(0.1)

At-least-once delivery

What if a worker dies after ZREM but before execute? Move the claim to a processing set with a visibility timeout instead of deleting it outright.

# Move to "processing" set instead of removing redis.ZREM("timers", timer_id) redis.ZADD("processing", {timer_id: now + 30}) # 30s timeout execute(payload) redis.ZREM("processing", timer_id) # Only after success # Separate reaper process: # scans "processing", moves expired items back to "timers"

5 · Pattern: priority queue — score = priority (inverted)

The problem

Multiple producers, multiple consumers. Higher-priority items processed first.

# Producer: enqueue with priority # Lower score = higher priority (ZPOPMIN gets lowest) priorities = {"CRITICAL": 0, "HIGH": 100, "MEDIUM": 200, "LOW": 300} job_id = f"job:{uuid.uuid4()}" redis.ZADD("job_queue", {job_id: priorities["HIGH"]})

# Consumer: atomic pop while True: result = redis.ZPOPMIN("job_queue", 1) if result: job_id, priority = result[0] process(job_id) else: sleep(0.1) # Queue empty

Preventing starvation

Use score = priority * 1_000_000 + timestamp. Within the same priority band, items stay FIFO by arrival time.

score = priorities["LOW"] * 1_000_000 + time.time() # LOW jobs still ordered by arrival time

6 · Pattern: rate limiter — sliding window

def is_allowed(user_id, limit=100, window=60): key = f"ratelimit:{user_id}" now = time.time() window_start = now - window # Remove old entries redis.ZREMRANGEBYSCORE(key, 0, window_start) # Count requests in window count = redis.ZCOUNT(key, window_start, now) if count < limit: # Add this request redis.ZADD(key, {f"{now}:{uuid.uuid4()}": now}) redis.EXPIRE(key, window) # Cleanup return True return False

Race condition

ZCOUNT + ZADD is not atomic. For strict enforcement, wrap both in a Lua script — or accept a slight over-limit under contention.

7 · Atomicity: your distributed lock

Redis is single-threaded for command execution. Each command completes fully before the next starts. That's the whole basis of "no distributed lock needed" — the return value of an atomic mutating command is the lock.

Two workers, same timer

Both workers see the same due timer, but Redis serializes their ZREMs — only one gets 1. The return value is the lock.

Atomic commands for coordination

Command	Atomic guarantee
`ZREM`	Only one caller gets 1
`ZPOPMIN`	Returns AND removes atomically
`SETNX`	Set only if not exists
`INCR`	Increment and return
`GETDEL`	Get and delete in one op

Not atomic

if ZSCORE(x): ZREM(x) — two commands means a race condition. Use a single command or a Lua script.

8 · Durability

Redis persistence options

Mode	How	Data loss	Performance
None	RAM only	Everything	Fastest
RDB	Snapshot every N sec	Last N seconds	Fast
AOF (everysec)	Append log, fsync/sec	~1 second	Good
AOF (always)	fsync every write	None	Slow

Tiered storage strategy

For critical long-term timers, keep cold timers in Postgres and promote them into Redis only as they approach their fire time.

Cold timers live cheaply in Postgres; an hourly job promotes imminent ones into Redis where workers poll them.

Interview sound bite

"Redis is durable with AOF, but I'd tier it — Postgres for timers more than a day out, Redis for imminent. Worst case Redis dies, I lose a second of timers, and idempotent handlers mean a double-fire is safe."

9 · Scaling — when single Redis isn't enough

Approach	How	Tradeoff
Shard by time bucket	`timers:2026-02-23-14`	Natural partitioning, but must scan multiple keys
Shard by hash	`timers:{hash(id) % N}`	Even distribution, but lose global ordering
Redis Cluster	Automatic sharding	A single ZSET can't span slots
Tiered storage	DB for cold, Redis for hot	Complexity, but handles billions

Hot key problem

90% of traffic is HIGH priority? That single key becomes a hot shard. Split it across several keys and have consumers fan out over all of them.

# Split hot priority across multiple keys shard = hash(job_id) % 4 key = f"queue:HIGH:{shard}" # Consumer reads from all shards for shard in range(4): result = redis.ZPOPMIN(f"queue:HIGH:{shard}", 1)

10 · Interview cheat sheet

Problem	Say
Timer / Scheduler	"ZSET, score = fire_time, poll with ZRANGEBYSCORE, claim with ZREM"
Priority Queue	"ZSET, score = priority (lower = higher), consume with ZPOPMIN"
Rate Limiter	"ZSET, score = timestamp, ZCOUNT for window, ZREMRANGEBYSCORE to clean"
Distributed Lock	"ZREM return value — Redis is single-threaded, only one caller gets 1"
At-least-once	"Move to processing set with visibility timeout, reaper moves back if expired"
Durability	"AOF everysec for ~1s loss, or tier to Postgres for critical long-term"
Scaling	"Shard by time bucket or hash, tier cold data to DB"

The meta-point

The best distributed systems are simple primitives composed carefully: ZSET = skip list + hash. Atomicity = single thread. Lock = check the return value. No magic, just layers.