DAG Task Scheduler — Topological Sort

The one sentence: in-degree 0 = ready to run now; finishing a task decrements its dependents' in-degree; if you can't drain every node, there's a cycle. That covers ordering, cycle detection, and parallel scheduling.

When to reach for it

"Must happen before" · "depends on" · "prerequisites" · "build / install order" · "can you finish all" · "minimum time with N workers." Anything with ordering under constraints.

1 · Kahn (BFS on in-degree) — the default

Detects cycles for free, and the level-by-level drain gives parallel scheduling. O(V + E).

from collections import deque def topo_order(tasks, deps): # deps = [(a, b)] → a before b adj = {t: [] for t in tasks} indeg = {t: 0 for t in tasks} for a, b in deps: adj[a].append(b); indeg[b] += 1 q = deque(t for t in tasks if indeg[t] == 0) # no prereqs → ready order = [] while q: t = q.popleft(); order.append(t) for nxt in adj[t]: indeg[nxt] -= 1 if indeg[nxt] == 0: q.append(nxt) # gated: all prereqs done if len(order) != len(tasks): raise ValueError("cycle — not a DAG") return order

Variants — trivial extensions of the same loop

Want	Change
Schedulable? (Course Schedule I)	return `len(order) == len(tasks)`
Min time, unlimited workers	drain the whole ready set per tick; count ticks = critical-path length
Order returned (Course Schedule II)	that's `topo_order` itself

# parallel time: one full level = one time unit time = 0 while q: for _ in range(len(q)): # drain the level you have right now t = q.popleft() for nxt in adj[t]: indeg[nxt] -= 1 if indeg[nxt] == 0: q.append(nxt) time += 1

2 · Real scheduler — dependencies + workers + priorities

The impressive version. Engine = two heaps + a free-worker pool (event-driven sim). O((V + E) log V).

ready : min-heap by (priority, task) — what could start now.
running : min-heap by (finish_time, task, worker) — jump time to the next finish.

import heapq def schedule_workers(tasks, deps, num_workers, duration=None, priority=None): duration = duration or {t: 1 for t in tasks} priority = priority or {t: 0 for t in tasks} # lower = run first adj = {t: [] for t in tasks} indeg = {t: 0 for t in tasks} for a, b in deps: adj[a].append(b); indeg[b] += 1 ready = [(priority[t], t) for t in tasks if indeg[t] == 0] heapq.heapify(ready) free, running, schedule, now = list(range(num_workers)), [], {}, 0 while ready or running: while ready and free: # fill every free worker, highest priority first _, t = heapq.heappop(ready) w = free.pop() schedule[t] = (now, now + duration[t], w) heapq.heappush(running, (now + duration[t], t, w)) if not running: break now = running[0][0] # jump to next finish while running and running[0][0] == now: _, t, w = heapq.heappop(running) free.append(w) # worker freed for nxt in adj[t]: indeg[nxt] -= 1 if indeg[nxt] == 0: heapq.heappush(ready, (priority[nxt], nxt)) if len(schedule) != len(tasks): raise ValueError("cycle — not a DAG") makespan = max((e for _, e, _ in schedule.values()), default=0) return schedule, makespan

Loop logic

Fill free workers from ready → jump now to the next finishing task → free those workers + release their dependents → repeat. A cycle shows up as "ready empties, running empties, but tasks remain" → caught by the length check.

3 · DFS variant (if they ask)

Recursive post-order + 3-color marks. WHITE = unseen, GRAY = on the current stack (a GRAY neighbor = back edge = cycle), BLACK = done. Append post-order, reverse.

def topo_order_dfs(tasks, deps): adj = {t: [] for t in tasks} for a, b in deps: adj[a].append(b) WHITE, GRAY, BLACK = 0, 1, 2 color, out = {t: WHITE for t in tasks}, [] def visit(t): color[t] = GRAY for nxt in adj[t]: if color[nxt] == GRAY: raise ValueError("cycle") if color[nxt] == WHITE: visit(nxt) color[t] = BLACK; out.append(t) # post-order for t in tasks: if color[t] == WHITE: visit(t) out.reverse(); return out

Relationship to the web crawler

Same skeleton, opposite relationship to cycles

Same skeleton (directed graph + visited/in-degree + queue), opposite relationship to cycles. The crawler dedups to survive cycles (a visited set); topo sort forbids cycles to produce an order (the in-degree counter). Crawling has no ordering constraint, so it's plain BFS — never bring topo machinery to a coverage problem.

Say it in the room: "In-degree 0 = ready; pick by priority into free workers; advance time to the next finish, which frees a worker and may make dependents ready." That one loop scales from Course Schedule to a real job scheduler.