Overview
What this concept solves
Mark & sweep is the original tracing collector, and the mental model behind almost every modern GC. Instead of asking each object "how many things point at you?", it asks one global question: starting from the roots, what can I actually reach? Everything reachable is alive. Everything else is garbage, by definition.
It runs in two phases. Mark does a graph traversal from the roots — local variables, globals, registers — following every pointer and setting a marked bit on each object it visits. Sweep then walks the entire heap linearly and frees every object whose bit is still clear, resetting the bits on the survivors for next time.
Because it's built on reachability, it collects cycles for free: a cycle that no root can reach is simply never marked, so sweep reclaims it like any other garbage. That's the property reference counting could never give you.
Mechanics
How it works
Two phases, one reachability question
- Mark — seed the worklist with the roots. Push every root-referenced object onto a worklist (or recursion stack).
- Mark — drain the worklist. Pop an object, set its
markedbit, and push each object it points to that isn't already marked. Repeat until the worklist is empty. Every live object is now marked. - Sweep — scan the whole heap. Walk linearly through every object. If
markedis set, clear it (ready for the next cycle). If it's clear, the object is unreachable — free it. - Resume. The marks are reset, the free list has reclaimed the dead, and the program continues.
It's just a graph traversal
Mark is breadth-first or depth-first search over the object graph, rooted at the GC roots. A cycle is a non-issue: the first time you reach an object in the cycle you mark it, and the if not already marked check stops you from looping forever. This is exactly why mark & sweep handles the cycles that defeat reference counting.
Two real costs: pauses and fragmentation
Classic mark & sweep is stop-the-world — the program must freeze during marking, or it could rewire pointers out from under the collector. And sweep only frees objects in place; it never moves survivors. Over time the heap becomes a patchwork of live objects and reclaimed holes — fragmentation — which can starve a large allocation even when plenty of total memory is free. Mark-compact and copying collectors exist to fix exactly this.
Interactive prototype
Run it. Break it. Tune it.
Sandboxed simulation embedded right in the page. No setup, no install.
About this simulation
A tracing collector in two phases. Mark: start at the roots and walk every pointer, painting each object you reach as live (green). Sweep: scan the whole heap and free anything still unmarked (red). Pick a scenario and step through. Scenario 2 shows a cycle collected with ease — reachability, not counting, is what matters. Scenario 3 shows a self-referential island that's still garbage because no root reaches it.
Hands-on
Try these on your own
Open the prototype above, run each experiment, predict the answer, then verify.
Trace a clean heap
Run scenario 1 (Basic trace) and step through the mark phase. Note which objects light up green as you follow pointers from the roots — and which never get touched. Predict the unmarked set before you reach sweep; those are exactly the ones that turn red.
Collect a cycle
Run scenario 2 (Cycle wins). Two objects point at each other but no root reaches them. Step to the end and confirm both are swept. Compare this with the reference-counting prototype, where the identical cycle leaked forever.
Spot the floating island
Run scenario 3 (Floating island). A little self-connected group of objects looks 'busy' — full of internal pointers — but no root can reach it. Watch mark skip the whole island, then sweep reclaim it. Internal connectivity never implies liveness; only root-reachability does.
In practice
When to use it — and what you give up
When to reach for it
- You need to collect cycles correctly — anything reference counting alone can't handle.
- You want a simple, well-understood baseline — mark & sweep is the teaching default and the foundation most production collectors extend.
- Object movement is undesirable or impossible — because sweep leaves survivors in place, addresses are stable, which matters for code holding raw interior pointers (some C/C++ interop, conservative collectors).
- Pauses are acceptable, or you'll later make it incremental/concurrent — the basic algorithm pairs naturally with the tri-color techniques later in this track.
Real-world example
Ruby's MRI used a stop-the-world mark & sweep for years before adding incremental and generational improvements. Boehm GC — the drop-in conservative collector for C and C++ — is a mark & sweep. And the 'mark' half is the literal core of the JVM's CMS and G1, V8's old-space collector, and Go's collector; they all start from mark & sweep and add concurrency, compaction, and generations on top.
Pros
- Collects cycles — reachability ignores how objects point at each other.
- No per-write overhead — unlike reference counting, normal pointer assignments cost nothing; the work is batched into the GC cycle.
- Survivors don't move — addresses stay stable, simplifying interop with native code.
- Conceptually simple — a graph traversal plus a linear scan. Easy to reason about and to extend.
Cons
- Stop-the-world pauses — the basic algorithm freezes the program for the whole mark + sweep, and pause time scales with heap size.
- Fragmentation — frees in place, leaving holes that can defeat large allocations.
- Sweep touches the whole heap — even the dead, cold pages, hurting cache and paging behavior.
- Work is proportional to heap size, not garbage — you pay to walk live and dead alike.
Reference
Code & further reading
A minimal reference implementation and pointers worth bookmarking.
// Mark & sweep: trace reachability from the roots, then free the rest.
class GcObject {
marked = false;
refs: GcObject[] = []; // outgoing pointers
}
class MarkSweepHeap {
private heap: GcObject[] = []; // every allocated object
private roots: GcObject[] = []; // locals, globals, registers
collect(): void {
this.mark();
this.sweep();
}
/** Phase 1: paint everything reachable from the roots. */
private mark(): void {
const worklist = [...this.roots]; // BFS seed
while (worklist.length) {
const obj = worklist.pop()!;
if (obj.marked) continue; // already visited — breaks cycles
obj.marked = true;
for (const child of obj.refs) {
if (!child.marked) worklist.push(child);
}
}
}
/** Phase 2: linear scan — free the unmarked, reset the survivors. */
private sweep(): void {
const survivors: GcObject[] = [];
for (const obj of this.heap) {
if (obj.marked) {
obj.marked = false; // reset for the next cycle
survivors.push(obj);
} else {
// ...return obj's memory to the free list here (left in place).
}
}
this.heap = survivors;
}
}References & further reading
4 sources- Bookgchandbook.org
The Garbage Collection Handbook — Mark-Sweep (Ch. 2)
Jones, Hosking & Moss. The canonical description, including the mark-bit table and lazy sweeping.
- Docshboehm.info
Boehm-Demers-Weiser conservative GC
A production mark & sweep you can drop into a C or C++ program. Great for seeing the algorithm in real code.
- Articleen.wikipedia.org
Wikipedia — Tracing garbage collection
Covers the naive mark-and-sweep cycle, the mark bit, and tri-color refinements.
- Articleheroku.com
Ruby — Incremental Garbage Collection in Ruby 2.2
Koichi Sasada on how MRI's stop-the-world mark & sweep grew generational (2.1) and incremental (2.2) low-pause improvements.
Knowledge check
Did the prototype land?
Quick questions, answers revealed on submit. No scoring saved.
question 01 / 03
What does the 'mark' phase actually do?
question 02 / 03
Why does mark & sweep collect cycles that reference counting leaks?
question 03 / 03
After many mark & sweep cycles, a program fails to allocate a large object even though total free memory is more than enough. Why?
0/3 answered
Was this concept helpful?
Tell us what worked, or what to improve. We read every note.