interviews / anthropic-practice / 20
DAG Task Scheduler
Topological sort — the single most reused graph pattern. Course Schedule, build order, package install, job scheduling: all the same problem.
The one sentence: in-degree 0 = ready to run now; finishing a task decrements its
dependents' in-degree; if you can't drain every node, there's a cycle. That covers
ordering, cycle detection, and parallel scheduling.
When to reach for it
"Must happen before" · "depends on" · "prerequisites" · "build / install order" · "can you
finish all" · "minimum time with N workers." Anything with ordering under constraints.
1 · Kahn (BFS on in-degree) — the default
Detects cycles for free, and the level-by-level drain gives parallel scheduling. O(V + E).
from collections import deque
def topo_order(tasks, deps): # deps = [(a, b)] → a before b
adj = {t: [] for t in tasks}
indeg = {t: 0 for t in tasks}
for a, b in deps:
adj[a].append(b); indeg[b] += 1
q = deque(t for t in tasks if indeg[t] == 0) # no prereqs → ready
order = []
while q:
t = q.popleft(); order.append(t)
for nxt in adj[t]:
indeg[nxt] -= 1
if indeg[nxt] == 0: q.append(nxt) # gated: all prereqs done
if len(order) != len(tasks):
raise ValueError("cycle — not a DAG")
return order
Variants — trivial extensions of the same loop
| Want | Change |
| Schedulable? (Course Schedule I) | return len(order) == len(tasks) |
| Min time, unlimited workers | drain the whole ready set per tick; count ticks = critical-path length |
| Order returned (Course Schedule II) | that's topo_order itself |
# parallel time: one full level = one time unit
time = 0
while q:
for _ in range(len(q)): # drain the level you have right now
t = q.popleft()
for nxt in adj[t]:
indeg[nxt] -= 1
if indeg[nxt] == 0: q.append(nxt)
time += 1
2 · Real scheduler — dependencies + workers + priorities
The impressive version. Engine = two heaps + a free-worker pool (event-driven sim).
O((V + E) log V).
ready : min-heap by (priority, task) — what could start now.
running : min-heap by (finish_time, task, worker) — jump time to the next finish.
import heapq
def schedule_workers(tasks, deps, num_workers, duration=None, priority=None):
duration = duration or {t: 1 for t in tasks}
priority = priority or {t: 0 for t in tasks} # lower = run first
adj = {t: [] for t in tasks}
indeg = {t: 0 for t in tasks}
for a, b in deps:
adj[a].append(b); indeg[b] += 1
ready = [(priority[t], t) for t in tasks if indeg[t] == 0]
heapq.heapify(ready)
free, running, schedule, now = list(range(num_workers)), [], {}, 0
while ready or running:
while ready and free: # fill every free worker, highest priority first
_, t = heapq.heappop(ready)
w = free.pop()
schedule[t] = (now, now + duration[t], w)
heapq.heappush(running, (now + duration[t], t, w))
if not running: break
now = running[0][0] # jump to next finish
while running and running[0][0] == now:
_, t, w = heapq.heappop(running)
free.append(w) # worker freed
for nxt in adj[t]:
indeg[nxt] -= 1
if indeg[nxt] == 0: heapq.heappush(ready, (priority[nxt], nxt))
if len(schedule) != len(tasks):
raise ValueError("cycle — not a DAG")
makespan = max((e for _, e, _ in schedule.values()), default=0)
return schedule, makespan
Loop logic
Fill free workers from
ready → jump
now to the next
finishing task → free those workers + release their dependents → repeat. A cycle shows up as
"ready empties, running empties, but tasks remain" → caught by the length check.
3 · DFS variant (if they ask)
Recursive post-order + 3-color marks. WHITE = unseen, GRAY = on
the current stack (a GRAY neighbor = back edge = cycle), BLACK = done. Append post-order, reverse.
def topo_order_dfs(tasks, deps):
adj = {t: [] for t in tasks}
for a, b in deps: adj[a].append(b)
WHITE, GRAY, BLACK = 0, 1, 2
color, out = {t: WHITE for t in tasks}, []
def visit(t):
color[t] = GRAY
for nxt in adj[t]:
if color[nxt] == GRAY: raise ValueError("cycle")
if color[nxt] == WHITE: visit(nxt)
color[t] = BLACK; out.append(t) # post-order
for t in tasks:
if color[t] == WHITE: visit(t)
out.reverse(); return out
Relationship to the web crawler
Same skeleton, opposite relationship to cycles
Same skeleton (directed graph + visited/in-degree + queue),
opposite relationship to cycles.
The crawler
dedups to survive cycles (a visited set); topo sort
forbids cycles to produce an order
(the in-degree counter). Crawling has no ordering constraint, so it's plain BFS — never bring topo
machinery to a coverage problem.
Say it in the room: "In-degree 0 = ready; pick by priority into free workers; advance time
to the next finish, which frees a worker and may make dependents ready." That one loop scales from
Course Schedule to a real job scheduler.