summaryrefslogtreecommitdiffstats
path: root/HACKING
diff options
context:
space:
mode:
authorKaz Kylheku <kaz@kylheku.com>2014-10-30 07:49:07 -0700
committerKaz Kylheku <kaz@kylheku.com>2014-10-30 07:49:07 -0700
commit0580d0373fc9b9b6a84cfb5749257b095e610e73 (patch)
tree3b8d065dec821f0856313f47305911b579a35dd1 /HACKING
parent6eca4a9313fb8af95d1f0ea961b351aaba487a1e (diff)
downloadtxr-0580d0373fc9b9b6a84cfb5749257b095e610e73.tar.gz
txr-0580d0373fc9b9b6a84cfb5749257b095e610e73.tar.bz2
txr-0580d0373fc9b9b6a84cfb5749257b095e610e73.zip
Implementing finalization hooks.
* gc.c (struct fin_reg): New struct type. (final_list, final_tail, mark_makefresh): New static variables. (mark_obj): Under generational GC, if make_makefresh is in effect, set the generation to -1 on all marked objects. (sweep_one): In an EXTRA_DEBUGGING build, call breakpt if the object being swept is the one in break_obj. Under generational GC, place reachable objects that are in generation -1 the freshobj nursery and assign them to generation 0, rather than sticking them into the mature generation 1. (sweep): Under generational gc, reset the freshobj_idx variable here, so that sweep_one has an empty nursery in which to place the generation -1 objects. (prepare_finals, call_finals): New static functions. (gc): Call prepare_finals before sweep, and call call_finals just before re-enabling GC and returning. Do not reset freshobj_idx to zero; this was done in sweep, which may have added entries into it. (gc_finalize): New function. (gc_late_init): Register gc_finalize as intrinsic function finalize. * txr.1: Documented finalize. * HACKING: Documented finalization, described the additional meaning of the -1 generation, and added a section on debugging with break_obj and breakpt.
Diffstat (limited to 'HACKING')
-rw-r--r--HACKING193
1 files changed, 146 insertions, 47 deletions
diff --git a/HACKING b/HACKING
index d6983fb4..52a879f0 100644
--- a/HACKING
+++ b/HACKING
@@ -5,40 +5,43 @@ CONTENTS:
SECTION LINE
-0. Overview 44
-
-1. Coding Practice 51
-1.2 Program File Structure 74
-1.3 Style 88
-1.3 Error Handling 150
-1.4 I/O 163
-1.5 Type Safety 173
-1.6 Regression 215
-
-2. Dynamic Types 224
-2.1 Two Kinds of Values 231
-2.1 Pointer Bitfield 242
-2.2 Heap Objects 265
-2.3 The COBJ type 279
-2.4 Strings 296
-2.4.1 Encapsulated C Strings 311
-2.4.2 Representation Hacks for 2 Byte wchar_t 355
-
-3. Garbage Collection 414
-3.1 Root Pointers 431
-3.2 GC-safe Code 454
-3.2.1 Rule One: Full Initialization 480
-3.2.2 Rule Two: Make it Reachable 509
-3.3 Weak Reference Support 644
-3.4 Generational GC 687
-3.4.2 Representation of Generations 696
-3.4.3 Basic Algorithm 716
-3.4.4 Handling Backpointers 751
-
-4. Debugging 828
-4.2. Debugging the Yacc-generated Parser 959
-4.3. Debugging GC Issues 972
-4.4. Valgrind: Your Friend 995
+0. Overview 47
+
+1. Coding Practice 54
+1.2 Program File Structure 77
+1.3 Style 91
+1.3 Error Handling 153
+1.4 I/O 166
+1.5 Type Safety 176
+1.6 Regression 218
+
+2. Dynamic Types 227
+2.1 Two Kinds of Values 234
+2.1 Pointer Bitfield 245
+2.2 Heap Objects 268
+2.3 The COBJ type 288
+2.4 Strings 305
+2.4.1 Encapsulated C Strings 320
+2.4.2 Representation Hacks for 2 Byte wchar_t 364
+
+3. Garbage Collection 423
+3.1 Root Pointers 441
+3.2 GC-safe Code 464
+3.2.1 Rule One: Full Initialization 490
+3.2.2 Rule Two: Make it Reachable 519
+3.3 Weak Reference Support 654
+3.4 Finalization 697
+3.5 Generational GC 721
+3.5.2 Representation of Generations 730
+3.5.3 Basic Algorithm 766
+3.5.4 Handling Backpointers 801
+3.5.5 Generational GC and Finalization 879
+
+4. Debugging 908
+4.2. Debugging the Yacc-generated Parser 1039
+4.3. Debugging GC Issues 1052
+4.4 Object Breakpoint 1075
+4.5 Valgrind: Your Friend 1094
0. Overview
@@ -274,6 +277,12 @@ Though the type tag is defined by a enumeration, for memory management
purposes, the type field is overloaded with additional bitmasked values.
The FREE flag in a type field marks an object on the free list.
The REACHABLE flag is the marking bit used during garbage collection.
+There is also a MAKEFRESH flag, which is used to in conjunction with
+REACHABLE to indicate that an object, though reachable, should be
+kept in, or demoted to to the baby object generation. The implementation of
+user-defined finalization handlers uses this to prevent finalized objects from
+becoming mature, just because they were made reachable for the purposes of the
+finalization callback.
2.3 The COBJ type
@@ -428,6 +437,7 @@ reclaimed (indicating a bug in the garbage collection system), uses
of that object will see a bad type tag, so that there is a good chance
the program will throw an exception due to a failed type check.
+
3.1 Root Pointers
The marking phase of the garbage collector looks in two places for root
@@ -684,16 +694,40 @@ due to the lapsing of weak keys thus stay unmarked and are reclaimed during
the sweep phase of the gc, which soon follows.
-3.4 Generational GC
+3.4 Finalization
+
+Finalization (user-defined finalization hooks associated with objects) is
+implemented in a fairly straightforward way, with a slight complication
+having to do with generational GC (see below).
+
+Finalization uses a simple global list of registrations which is processed
+during every garbage collection pass, in two phases. First, just after
+the regular garbage collection marking phase and weak hash table processing,
+the finalization registration list is walked twice. First, those entries in it whose
+registered objects are still unreachable are flagged for later processing,
+then in the second pass over the list, all objects are given "new lease on
+life" by being marked as reachable. The registered functions in all entries are
+marked as reachable also. Next, the garbage collection sweep pass takes place
+to reclaim all unreachable objects and reset the GC-related flags in all
+objects. When this is done, it becomes safe to call the finalization functions.
+The list of registrations is walked once again, and the previously flagged
+entires are called out and expunged. The list is walked in a safe way which
+allows the called handlers to register new finalizations. The newly registered
+finalizations are combined with the unflagged, unexpunged previous
+registrations into a new list, which will be processed at the next garbage
+collection pass.
+
-3.4.1 Preprocessor Delimitation
+3.5 Generational GC
+
+3.5.1 Preprocessor Delimitation
Currently, the generational GC code is delimited by #ifdef CONFIG_GEN_GC.
So to understand what the differences are between the regular GC and
generational, one just has to read those sections dependent on that
preprocessor symbol.
-3.4.2 Representation of Generations
+3.5.2 Representation of Generations
Generational garbage collectors are typically copying collectors. In a copying
system, objects can be segregated into generations by their physical location.
@@ -706,14 +740,30 @@ generation.
There are only three generation values: -1, 0 and 1. Generation 0 indicate a
"baby" object. The value 1 indicates a mature object. A freshly allocated
-object is put into generation 0. The generation value -1 is used to mark baby
-objects which have been stored into the checkobj array, or generation 1
-objects which have been placed into the mutobj array, to prevent them
-from being added to these arrays twice. These arrays have to do with
-backreferences from old to young objects, and are described a few paragraphs
-below.
-
-3.4.3 Basic Algorithm
+object is put into generation 0. The generation value -1 has three different
+meanings:
+ 1. It is used to mark baby (generation 0) objects which have been stored into
+ the checkobj array, so that they are not placed there twice.
+ 2. It is used to mark mature (generation 1) objects which have been placed
+ into the mutobj array, also so they are not put there twice.
+ 3. It is used during marking to flag reachable objects which should not be
+ placed into, or remain in, generation 1.
+The checkobj and mutobj arrays have to do with handling backreferences from old
+to young objects, and are described a few paragraphs below. Meaning 3 has
+to do with interaction between finalization and generational GC, also described
+below. These different uses of -1 don't interefere because when the checkobj
+and mutobj arrays are processed, which happens early the marking phase,
+those objects are changed from generation -1 to generation 0. (A temporary
+situation, done in the knowledge that all reachable objects in generation 0
+that are processed in the sweep phase will go to generation 1). The
+third meaning of -1 is used in the last phase of marking having to do with
+finalization (the prepare_finals function), which applies this -1 generation
+only to objects that have been left unreachable by all previous marking, and
+are reachable only thorugh the finalizer registration list. After this phase,
+only these special objects can possibly have generation -1.
+
+
+3.5.3 Basic Algorithm
When an object is newly allocated, it is not only assigned generation 0, but is
appended into the freshobj array. This array allows the garbage collector to
@@ -748,7 +798,7 @@ GC saves time by avoiding doing full marking (terminating whenever it meets a
generation 1 object) and avoiding a full sweep (processing only the freshobj
array).
-3.4.4 Handling Backpointers
+3.5.4 Handling Backpointers
Under generational GC, there is the problem that objects in generation 1 can
be destructively changed (mutated) so that they point to baby objects. This is
@@ -825,6 +875,36 @@ of now pointing to some babies via its newly assigned elements. The generational
garbage collection pass then naturally deals with visiting the individual the
elements to mark those which are babies.
+
+3.5.5 Generational GC and Finalization
+
+There is an interaction between generational GC and TXR's finalization
+support. When objects registered for finalization are processed, they
+and their functions have to be reinstanted as reachable objects, so that the
+finalization handlers can safely execute. Under plain mark-and-sweep GC,
+these objects will be collected in the next garbage collection pass, since
+they will be found to be unreachable again (unless their finalization handler
+reinstated them into the reachability graph!) and this time they will not
+registered for finalization any more. Under generational GC, objects which
+are marked as reachable, however, pass into the mature generation.
+If this happens for objects which are being finalized, a silly situation
+occurs in which objects known to be unreachable, and which are only
+temporary made reachable for finalization, are being promoted to, or
+retained in, the mature generation, so that they won't be reclaimed until the
+next full garbage collection pass. To prevent this silly situation, objects
+which are marked as reachable during finalization processing are assigned
+to generation -1. During a generational GC, all such objects are generation
+0 objects, but during a full GC, this could include generation 1 objects
+also. Then, during the sweep phase, objects which carry this flag are assigned
+to generation 0 and are placed into the freshobj array nursery, as if they were
+just freshly allocated. Doing this is safe even for objects that were
+previously generation 1, because since the objects had just been found to be
+unreachable, this means that no references to them exist from any other live
+object, and that implies that no backreferences exist to them from the mature
+generation. (This takes place before the finalization handlers are called
+whcih could introduce such backreferences.)
+
+
4. Debugging
4.1. Using gdb
@@ -992,7 +1072,26 @@ dynamic objects are registered with the garbage collector and are precisely
traced. What isn't precisely traced is the call stack and machine context.
-4.4. Valgrind: Your Friend
+4.4 Object Breakpoint
+
+If TXR is compiled with -DEXTRA_DEBUGGING=1 then two global symbols are defined
+which make it possible to catch GC-related traversals of a particular object.
+
+When compiled with EXTRA_DEBUGGING defined as 1, TXR provides a function called
+breakpt. The purpose of this function is to serve as a target of a debugger
+breakpoint; when interesting situations happen, TXR calls that function.
+
+An EXTRA_DEBUGGING build also provides a global variable called break_obj, of
+type val. This normally holds the value nil, which is uninteresting. The
+variable can be set, from within the debugger, to hold an arbitrary object.
+When that object is visited during garbage collection and in certain other
+interesting situations such as hash table weak processing, the breakpt
+function is called. Using this object breakpoint feature, it is possible
+to investigate various issues, such as spurious retention: how is a particular
+object being reached.
+
+
+4.5 Valgrind: Your Friend
To get the most out running txr under valgrind, build it with valgrind support.
Of course, you have to have the valgrind development stuff installed (so