Jones, Richard; Lins, Rafael
Date finished: 1999-03-09
The first time you encounter garbage collection (GC, for short) is often by using Lisp or Scheme in an introductory computer science course, where it can seem an almost magical technology. How does the interpreter determine that a variable has become garbage, and can safely be deallocated? This book explains how GC works, in great detail, and interest people implementing garbage collectors or who've wondered how it's done. There are 3 approaches to GC, all of which are covered in this book:
- Reference counting: Every object keeps a count of the references pointing to it. When new references are created, the counter must be increased; when references are destroyed, the counter is decremented, and, if the count is zero, the object has become garbage and can be reclaimed. Reference counting unfortunately exacts the most overhead from constantly incrementing and decrementing the counters, and in its simplest form it can't collect cyclic data structures, because their count will never reach zero.
- Mark-sweep: Traverse the graph of object references, starting from the roots of the computation; roots might be the program's global variables, the contents of the C stack, static class variables, or a main namespace of some sort. Each object that's traversed is marked as being reachable; when the traversal is complete, any unmarked objects are unreachable because they can't be reached by the program any longer, and can be reclaimed as garbage.
- Copying: Divide memory into two halves, called "Fromspace" and "Tospace". Memory for new objects is always allocated out of Fromspace. When Fromspace fills up, a traversal similar to the one used by mark-sweep is done, but instead of just marking objects as reachable, they're copied into Tospace. Once the traversal is done, all the live objects are now in Tospace, and Fromspace no longer contains any live data; only garbage objects can be left in Fromspace. Tospace is now declared the new Fromspace, and new objects are allocated until it fills up; the algorithm is then repeated.
This book covers the above 3 algorithms for GC in great detail; there's a chapter for each algorithm, examining its limitations, common implementation techniques, and historical notes on the its development. Later chapters cover more advanced collectors that extend the basic algorithms in various ways. Generational collectors are based on the observation that many objects are temporary and have short lifetimes, while some few objects may last for the entire lifetime of the program. A generational garbage collector concentrates on young objects, because they will occupy the most recyclable space, and spends less time examining old objects that probably won't be reclaimed. Incremental and concurrent collectors don't bring the whole program to a halt while they scavenge through memory doing a collection, making them more suited for interactive or even real-time applications.
Chapter 9 is an interesting examination of the implementation of GC in an unfriendly environment, namely C programs. The Boehm-Demer-Weiser collector implements a mark-sweep garbage collector that can replace C's malloc() function, scanning stack frames and registers for potential pointers, and managing to do it fairly efficiently. One study found the Boehm-Demer-Weiser collector added 20% more execution time overhead than malloc() did -- that is, the time spent in memory allocation was 20% longer. (That doesn't mean it made programs 20% slower, because most programs spend more time computing results than they spend allocating memory). A garbage collector for C is a tricky job, complicated by the lack of type information, data values that may mimic pointers, and compiler optimizations that may render objects invisible to the collector. Barlett's Mostly Copying Garbage Collector, also described in this chapter, is more liberal and requires some assistance from the C programmer in exchange for greater accuracy.
The remaining 3 chapters consider suggestions for adding garbage collection to C++, the interaction of GC with cache memory, and distributed garbage collection. The topics of these chapters are still subjects of active research, so the final chapters are lighter on implementation details and heavy on references to research papers, making them a lot less interesting.
The authors are computer scientists specializing in GC (Richard Jones maintains a garbage collection page), and their coverage is quite complete, which at times makes for exceedingly dry reading as the authors enumerate what seems like every minor implementation variation and every relevant paper. On the other hand, if you're actually faced with implementing a garbage collector, the pointers into the research literature will be very useful. A casual reader (like me) can simply skim, or even skip, pages until the next interesting section arrives.
%T Garbage Collection %S Algorithms for Automatic Dynamic Memory Management %@ 1999-03-09 %A Jones, Richard %A Lins, Rafael %K computing