station;temperature), compute min/mean/max per station, as fast as possible. The parsing is trivial; the memory and concurrency are everything — so it's a live-fire test of both primers (concurrency · memory models). Grounded in a real Swift implementation (brennanMKE/BillionRowChallenge), with an honest principal-level critique.readLine() + Dictionary +
Double ≈ minutes; a tuned version ≈ 1–2 s. ~100× — entirely
from the systems techniques in the two primers.
The implementation memory-maps the file and parses it in place — exactly the zero-copy story (memory primer §1.7):
findNextNewline — segment 0 starts at byte 0, every other segment skips its partial first line, and each extends its end to the next newline so no line is split across workers. This is the classic chunk-boundary off-by-a-line bug, handled properly.mmap to avoid copies… then copy every line into a String
one line later. The reference does both of these — which makes it the perfect
teaching contrast.String (two allocations) per line.map builds a throwaway Array<UInt8>; String(bytes:encoding:) allocates again and copies. This is the heap (memory primer §1.8) on the hottest possible path — the mmap zero-copy win is undone immediately.String(cString:length:) is failable → returns nil on any invalid UTF-8 byte → the iterator returns nil → the segment ends early, silently dropping the rest of its data.AsyncSequence for in-memory parsingasync adds a suspension point a billion times for nothing.async here is pure overhead, not a win.String. Parse the bytes in place, key the map on the
raw name bytes, parse the temperature as an integer, and fan out one
synchronous worker per core. Every change maps to a primer concept.| Fix | Why it's faster | Primer concept |
|---|---|---|
| Parse bytes in place (no String) | kills 2 allocs + 2 copies per line; stays on the cache-hot mmap pages | memory §1.7 zero-copy / §1.8 heap-avoidance |
| Key map on raw name bytes | ~400 distinct stations → the only allocations are bounded (one per new station), not per line | memory §1.8 (heap = the bounded, shared thing) |
| Integer temp parse (xx.x → int*10) | integer ALU, no float parse/round; the NN-engine-style "avoid the slow unit" move | memory §1.5 (execution unit) |
| Open-addressing hash map, sized to fit cache | 1B lookups → a cache miss per lookup is the whole runtime; keep it in L1/L2 | memory §1.5 ("cache misses are the enemy") |
concurrentPerform, 1 worker/core, private maps | CPU-bound, never blocks; no locks during parse; merge once | concurrency §6.5 (≈core-count, no exhaustion) + §7 (private state → no race) |
Synchronous, not async | no suspension points on work that never suspends | concurrency §6.5 (async only pays off on real I/O) |
DispatchQueue.concurrentPerform(iterations: cores) (GCD's parallel-for)
is the natural fit: it runs exactly core-count iterations of uninterrupted work and
joins. A TaskGroup works too, but you'd gain nothing and must be sure no
parse step blocks a cooperative-pool thread (the forward-progress rule, §6.5). When
the work is "saturate every core with non-blocking compute and reduce," GCD's
parallel-for is the right tool — a case where newer isn't automatically better.Each worker produces a private station map; merge folds them into one.
This is the same shape as the distributed-training all-reduce (the inference/training
prep): parallel, independent local work → one synchronization barrier at the
end. No locks during the parse (the expensive 99.99%); a single merge over ~400
stations × N workers (trivial) at the close. Contend only where contention is cheap.
AsyncSequence over resident memory adds suspension overhead with no suspension benefit — the forward-progress rule, read the other way.concurrentPerform beats a TaskGroup here precisely because the work never blocks — match the tool to the work shape, not the calendar.Reference implementation: brennanMKE/BillionRowChallenge (Swift). Critique and the improved approach are this author's; the "fix" code is illustrative (pointer/byte-parsing sketch), not a drop-in. Companion primers: Apple Systems & Concurrency · Apple Memory Models.