The Compiler-Base library

The Compiler-Base library exports the basic data structures used by the compiler. It also provides a number of miscellaneous utility modules. Every other library relies on Compiler-Base library.

Miscellaneous Utility Modules

Several modules in Compiler-Base library appear to contain assorted utility routines used elsewhere in the compiler.

Common

This module imports a number of standard modules and re-exports some or all of their constants. Essentially, this module is used to specify a local Dylan dialect for use by the rest of the compiler. It includes Dylan, some of Extensions, a few names from Table-Extensions and several of the I/O-related modules.

Utils

This appears to be the "grab bag" module. It holds a mix of completely unrelated minor classes and functions.

The OD-Format Module

Introduction

The OD-Format module was designed by Rob MacLachlan and implements the Object Description Format, a binary format for storing persistent objects. This is used by the *.lib.du files emitted by d2c, though it might be potentially useful elsewhere.

The file gd/src/d2c/compiler/base/od-format.dylan contains an extensive discussion the module's goals and data format.

The basic requirement of the "Object Description Format" (ODF) is to allow an open-ended variety of objects to be dumped into a file, where:

  • instances may refer to each other in an arbitrary graph structure, and

  • some of the references may be to objects that are defined in distinct object description units (files, etc.)

In order to make the design interesting, the ODF attempts to satisfy various incompatible requirements:

  • To support efficient parsing and unparsing on a variety of architectures, but also to potentially serve as a cross-platform interchange format.

  • To work well on sequential media (like a socket), but also to potentially support random-access demand-loading of objects when possible (e.g. on files.)

We should consider making this into a standalone library at some point in the future.

Dumping Objects


<data-unit-name>[Constant]

Things that can be the name of a data unit.

Value

<symbol>

Description

<data-unit-name> is a <symbol> because is must be comparible with \==.


$library-summary-unit-type[Constant]

A constant for begin-dumping

Value

0

Description

This is currently the only type of OD format allowed. To build another format (e.g. DOOD), a new exported constant must be added to OD-Format and the file extension must be added to $unit-type-strings (an internal constant to OD-Format).


begin-dumping[Function]

Returns a dump state used by other OD functions.

Synopsis

begin-dumping (name, type, #key where) => (dump-state)

Parameters

nameAn instance of <data-unit-name>. The thing to dump.
typeAn instance of <integer>. The "unit type code"
where:An instance of false-or(<string>). Dump file name. Defaults to name.<type translation>.du .

Return Values

dump-stateAn instance of <dump-state>.

Description

Creates a state (which acts very much like a buffer) for other dumping functions to use to build the dump file in memory. The end-dumping function actually writes the built information to file.


end-dumping[Function]

Flushes the dump state to file.

Synopsis

end-dumping (state) => ()

Parameters

stateAn instance of <dump-state>.

Return Values

None.

Description

Actually output the file.


dump-od[Open Generic]

Write an object to the dump buffer.

Synopsis

dump-od (object, buffer) => ()

Parameters

objectAn instance of <object>. The object to dump.
bufferAn instance of <dump-buffer>. The in-memory buffer into which the object should be dumped.

Return Values

None.

Description

This routine is overloaded to dump individual types objects. Methods for primitive Dylan types can be found in the Section called The Dylan-Dump Module.


<dump-state>[Class]

Holder of the in-progress object dump.

Superclasses

<dump-buffer>

Initialization Keywords

None.

Description

Contains all information about the current object-dump in progress. It is illegal to use a dump-state after it has been passed to end-dumping. All instances of <dump-state> are created by calls to begin-dumping.

Loading Dumped Objects


<dispatcher>[Class]

Binary to in-memory instance transformer.

Superclasses

<object>

Initialization Keywords

initialize-from:An instance of <object>. Unless #f, initialize instance's dispatch information from this dispatcher. Defaults to *default-dispatcher*.

Description

A dispatcher is an object that tells the loader how to convert binary data into Dylan objects.


*default-dispatcher*[Variable]

Default dispatcher used for initialization.

Type

<dispatcher>

Description

Dylan-dump loads this variable with dispatching information for several commonly-used Dylan types (see the Section called The Dylan-Dump Module).


*data-unit-search-path*[Variable]

Paths to du files.

Type

<sequence>

Value

#[]

Description

This is used by find-data-unit to locate the du file to load.


<location-hint>[Constant]

A hint of where to find a data unit.

Value

<byte-string>

Description

This is currently an absolute(?) pathname.


<word>[Constant]

A portable, full-sized <integer>.

Value

<integer>

Description

It seems mindy has a limited size for <integer>, so, for mindy, <word> is a <general-integer> to allow for proper bit manipulations that do not clobber the value via overflow.


find-data-unit[Method]

Locates then loads a du file.

Synopsis

find-data-unit (name, type, #key location-hint, check-hash, dispatcher) => (res)

Parameters

nameAn instance of <data-unit-name>.
typeAn instance of <integer>. (Currently only $library-summary-unit-type allowed).
location-hint:An instance of false-or(<location-hint>). File name for the dump file; used if the data unit cannot be found in the current *data-unit-search-path*.
check-hash:An instance of false-or(<word>). Can be used to detect version mismatches between data units (used internally as expected-hash)
dispatcher:An instance of <dispatcher>. Provides the dispatcher to be used for the loading process. Defaults to *default-dispatcher*.

Return Values

resAn instance of <data-unit>. Contains the resulting transformed objects from the du file.

Description

Find-data-unit, when given a data unit name and type, returns the corresponding data unit. This is the main entry for causing loading to happen. Also used recursively.

Defining Simple Dumpers and Loaders

To write a dumper for a class, one needs to think of a name for the class (usually just take the <>'s off the class name), register the name with register-object-id, and define a dumper. If your data structure contains any object sharing or circular usages, also see the Section called Circular Usage and the Section called Object Sharing. For automatic generation of object dumpers and loaders, see the Section called Autodump.


register-object-id[Method]

Associates a number for a type for the OD dumping functions.

Synopsis

register-object-id (name, id) => ()

Parameters

nameAn instance of <symbol>. The name of the type.
idAn instance of <integer>. The associated integral value.

Return Values

None.

Description

Equate the name with the id. This function is not exported; rather, all register-object-id calls must occur in the OD-Format module.

Of course, this requirement makes it really tough for extending OD-Format for user defined types ...


add-od-loader[Function]

Register a method for reading a dumped object back into memory.

Synopsis

add-od-loader (dispatcher, name, func) => ()

Parameters

dispatcherAn instance of <dispatcher>. When loading .lib.du files, this is the *default-dispatcher*
nameAn instance of <symbol>. The method will be activated when the loader encounters the type registered with register-object-id in the du file.
funcAn instance of <function>. The loader method.

Return Values

None.

Description

This function is the "other half" of dump-od. It is called to register a function for reloading dumped objects, which has the signature:

method(state :: <load-state>) => res :: <object-type-read-in>

If you use add-od-loader, you will need to define a corresponding method on dump-od (see the Section called Dumping Objects).


add-make-dumper[Method]

Convenience method to add both a loader and dumper.

Synopsis

add-make-dumper (name, dispatcher, obj-class, slots, #key load-external, dumper-only, load-side-effect, dump-side-effect) => ()

Parameters

nameAn instance of <symbol>. Type name.
dispatcherAn instance of <dispatcher>. Where the loader and dumper are being added.
obj-classAn instance of <class>. Must be the direct class of the instances we are to dump (and of course, must also be instantiable via make.)
slotsAn instance of <list>. The slots' descriptions of the class.
load-external:An instance of <boolean>.
dumper-only:An instance of <boolean>. If #t, do not create the loader.
load-side-effect:An instance of false-or(<function>).
dump-side-effect:An instance of false-or(<function>).

Return Values

None.

Description

A convenient way to define both a loader and a dumper for objects of class obj-class. Slots is a list which is in this format:

#(slot-accessor-function-1, init-keyword-1, setter-function-1,
  slot-accessor-function-2, init-keyword-2, setter-function-2,
  ...)

Note that the stuff is implicitly grouped in clumps of 3. Either of the init-keyword or the setter may be #f, but not both. The "make dumper" requests backpatching when it encounters an unresolved forward reference (due to circularities). Any slot which might have a forward reference must have a setter function (it can also have an init-keyword, which will be used if possible).

A "make dumper" may be used when an object can be reconstructed from some accessor values by reinstatiating the object with make and backpatching some slots. This is pretty general, since the accessor and setter function are not required to be slot accessors and setters.

Autodump

[Description of Autodump coming soon]

ODF Representation of Objects

In the dump files, objects contain the following parts: a header, raw data, sub-objects, and an end-of-object marker. The header is a single machine word which contains the class of the object. (Actually, it contains the object-id, which is found by registry look-up.) The header also indicates if there is any raw data and sub-objects. The raw data section is zero or more machine words which only have meaning to the object's loader. If raw data is present, it is preceded by a byte-count (which is a single machine word). If there are sub-objects, they come next. The last object in a sub-object list is a magical $end-object, which denotes both the end of the sub-object list and the last word of the current object.

For more detail on the binary representation and on issues such as endian-ness and machine word size, see the Section called Binary Representation.

When an object is loaded, the loading code reads in the header. The object class id is extracted from the header. The dispatcher then provides a load function based on the class id, which reads the remainder of the data associated with the object and returns the reconstructed object.

Functions for Writing Dumpers

When adding a method to dump-od use the methods described here to push out the objects correctly.


dump-definition-header[Method]

Dump an object definition header.

Synopsis

dump-definition-header (name, buf, #key subobjects, raw-data) => ()

Parameters

nameAn instance of <symbol>.
bufAn instance of <dump-buffer>. Probably best to use the <dump-state> instance passed into dump-od
subobjects:An instance of <object>. #t means this object has subobjects Defaults to #f.
raw-data:An instance of <object>. The Raw-Format field in an object-definition header encodes information needed to byte/word format translation, and in the degenerate case, indicates there is no raw data at all. Allowable format codes are $odf-no-raw-data-format, $odf-byte-raw-data-format, $odf-16bit-raw-data-format, $odf-32bit-raw-data-format, $odf-64bit-raw-data-format, $odf-untranslatable-raw-data-format, and $odf-word-raw-data-format (which is 32 or 64 as appropriate, used for <word> in format pseudo-objects). Defaults to $odf-no-raw-data-format.

Return Values

None.

Description

The first dump call in dump-op for simple objects, this function dumps the object's header to the buffer. If one is dumping compound objects, first save the position of the buf before calling this method with some code like this:

define method dump-od(obj :: <ratio>, buf :: <dump-state>) => ();
  let start-pos = buf.current-pos;
  dump-definition-header(#"ratio", buf, subobjects: #t);
  dump-od(obj.numerator, buf);
  dump-od(obj.denominator, buf);
  dump-end-entry(start-pos, buf);
end method dump-od;

This is necessary so that dump-end-entry (see the next documented method) writes out the terminal correctly.


dump-end-entry[Method]

Dump an "end of object" marker. This is only necessary if there are subobjects.

Synopsis

dump-end-entry (start-posn, buf) => ()

Parameters

start-posnAn instance of <integer>. Where the object began.
bufAn instance of <dump-buffer>. Usually the <dump-state> instance passed into dump-od.

Return Values

None.

Description

Before dumping the compound object, be sure to save the buffer's current position (as in the sample code for dump-definition-header) because this method needs that information.


dump-simple-object[Method]

Dump an object that has only a constant number of subobjects (no raw data).

Synopsis

dump-simple-object (name, buf, #rest slots) => ()

Parameters

nameAn instance of <symbol>. The instance's class name.
bufAn instance of <dump-buffer>. Usually the <dump-state> instance passed into dump-od.
slotsInstances of <object>. The values of the slots of this object.

Return Values

None.

Description

This method provides a simple way to dump objects of known size, so using this method, writing dump-od for <ratio> would be:

define method dump-od(obj :: <ratio>, buf :: <dump-state>) => ();
  dump-simple-object(#"ratio", buf, obj.numerator, obj.denominator);
end method dump-od;

dump-word[Method]

Dump an integer into the buffer as one word.

Synopsis

dump-word (obj, buf) => ()

Parameters

objAn instance of <word>. The integer to dump.
bufAn instance of <dump-buffer>.

Return Values

None.

Description

This dumps an <integer> as part of of an object. If you want to dump an <integer> as an <integer>, use the dump-od on <integer> in the Section called The Dylan-Dump Module.


dump-raw-data[Method]

Dumps a collection into the buffer.

Synopsis

dump-raw-data (obj, bsize, buf) => ()

Parameters

objAn instance of <collection>. The collection to dump.
bsizeAn instance of <integer>. The number of elements in the collection.
bufAn instance of <dump-buffer>.

Return Values

None.

Description

This take a collection, obj, and dumps its contents to buf. This method is usually called in connection with dumping a complex object, for example, dump-od for <byte-string> in the Section called The Dylan-Dump Module dumps strings thusly:

define method dump-od(obj :: <byte-string>, buf :: <dump-buffer>) => ();
  dump-definition-header(#"byte-string", buf,
  			 raw-data: $odf-byte-raw-data-format);
  dump-raw-data(obj, obj.size, buf);
end method;

Functions for Writing Loaders

The basic concept of writing a loader is to read in the information you wrote out with the method on dump-od. This process (writing loaders) is a bit different than writing dumpers, though. Here, you provide a method to add-od-loader (described in the Section called Defining Simple Dumpers and Loaders) that is activated by a <dispatcher> instance when it encounters the object in the dump file to load.

A loader on <byte-string> (the corresponding dump-od method is found in the Section called Functions for Writing Dumpers) would look like this:

add-od-loader(*default-dispatcher*, #"byte-string", 
  method (state :: <load-state>)
   => res :: <byte-string>;
    let next = state.od-next;
    let bsize = buffer-word(state.od-buffer, next);
    state.od-next := next + $word-bytes;
    let res = make(<byte-string>, size: bsize);
    load-raw-data(res, bsize, state);
    res;
  end method
);

(In this case, the call to add-od-loader is made at the top-level in dylan-dump.dylan.) The functions used to build the dispatch method are described below.


<load-state>[Class]

A load-state is an object that describes the current state of the loading process. Among other things, it contains an input <stream> and the buffer to the stream.

Superclasses

<object>

Initialization Keywords

stream:An instance of <stream>. The stream we're reading from. Only used for refilling the buffer. This slot is accessed via the od-stream method.
buffer:An instance of <buffer>. State of the stream buffer, which we hold for the duration of the load. It is accessed via the od-buffer method.
next:An instance of <buffer-index>. The index of the state's stream; it is accessed via the od-next method and can be set (changed) using the od-next-setter method.
end:An instance of <buffer-index>. The end-index for the buffer and stream. This slot is accessed via the od-end method.
position-offset:An instance of <integer>. When added to the od-next pointer, this offset yields the byte offset of our current position from the data-unit start. This must be updated whenever we refill the buffer.
dispatcher:An instance of <dispatcher>.

Description

A <load-state> object is passed into the dispatching method to reconstruct a dumped object (so, you do not need to make a <load-state> instance). Use that object in conjuction with the functions described below.


$end-object[Constant]

An end-of-subobjects marker.

Type

<end-object>

Description

A magical object which is always the last sub-object in an object.


load-object-dispatch[Method]

Read an object header and invoke the appropriate loader. Return the object that the loader returns.

Synopsis

load-object-dispatch (state) => (res)

Parameters

stateAn instance of <load-state>.

Return Values

resAn instance of <object>. The reconstructed object.

Description

Start loading some objects from a load-state. Dispatches to an appropriate loader method depending on the dispatcher and the entry type. Returns the object.

For example, to load a <ratio> instance, build a dispatching method like this:

add-od-loader(*default-dispatcher*, #"ratio", 
  method (state :: <load-state>) => res :: <ratio>;
    
    let npart = load-object-dispatch(state);
    let dpart = load-object-dispatch(state);
    assert-end-object(state);
    ratio(npart, dpart);
  end method
);

load-raw-data[Method]

Loads data into a parameter.

Synopsis

load-raw-data (res, elsize, state) => (nnext)

Parameters

resAn instance of <byte-string>. A pre-allocated buffer where the raw data gets loaded.
elsizeAn instance of <integer>. Size of the buffer (or the maximum bytes to load).
stateAn instance of <load-state>.

Return Values

nnextAn instance of <buffer-index>. The position in the <load-state>'s buffer after loading the raw data.

Description

Loads raw data into res. It is up to the dispatching method to interpret that loaded raw data in order to reconstruct the object's state.

Usually a call to load-raw-data is to load a string, so reconstruction is easy (see the example at the top of this section). One thing of note is that this method is not functional: res is not returned, but is a destructively modified parameter.


load-subobjects-vector[Method]

Returns a vector of an object's subobjects.

Synopsis

load-subobjects-vector (state, size-hint) => (res)

Parameters

stateAn instance of <load-state>.
size-hintAn instance of false-or(<integer>). If a number, then the number of subobjects MUST match.

Return Values

resAn instance of <simple-object-vector>. The subobjects of the object.

Description

Loads the dumped subobjects into a vector. Note that the subobjects may be unresolved when loaded with this function ... it is the responsibility of the user of this function to resolve those objects (see the Section called Circular Usage for more information on functions for object resolution such as obj-resolved?, actual-obj, and request-backpatch).

So, getting a <vector> from this function still requires programming effort, e.g.:

add-od-loader(*default-dispatcher*, #"simple-object-vector", 
  method (state :: <load-state>) => res :: <simple-object-vector>;
    let contents = load-subobjects-vector(state, #f);
    let rsize = contents.size;
    let res = make(<simple-object-vector>, size: rsize);
    for (i from 0 below rsize)
      let el = contents[i];
      if (el.obj-resolved?)
        res[i] := el.actual-obj;
      else
        res[i] := el;
	request-backpatch(el, method (actual) res[i] := actual end);
      end;
    end;
    res;
  end method
);

Fortunately, the Dylan-dump module provides this dispatching method. See the Section called The Dylan-Dump Module.


load-sole-subobject[Method]

Loads the only subobject of this object.

Synopsis

load-sole-subobject (state) => (res)

Parameters

stateAn instance of <load-state>.

Return Values

resAn instance of <object>. The resolved subobject.

Description

Load a one-slot object. The value is returned, and we check that there was in fact only one subobject.


assert-end-object[Method]

Reads from the buffer, and signals an error if the next thing read isn't an end-of-object marker.

Synopsis

assert-end-object (state) => ()

Parameters

stateAn instance of <load-state>.

Return Values

None.

Description

Ensure the subobject list ends with the end-of-object token.

Circular Usage

To deal with circular object usage graphs, several mechanisms are employed. Objects can be dumped as forward references and defined later. The loader for the object containing the forward reference has the option of delaying object creation until the forward referenced object is loaded, or requesting that the object be backpatched when the forward referenced object is finally loaded.

The following descriptions TBD:

  • new-local-id

  • label-next-object

  • dump-local-reference

  • <forward-ref>

  • obj-resolved?

  • actual-obj

  • request-backpatch

  • resolve-forward-ref

Object Sharing

By default, it is assumed that the object being dumped has no object sharing, i.e., for all sub-objects the ref-count is one. For any object that has a ref-count greater than one, that object must be a sub-class of <identity-preserving-mixin>. (If it is ok to dump the object multiple times and load it back as separate objects, then this superclass may be omitted.)

The following descriptions TBD:

  • <identity-preserving-mixin>

  • maybe-dump-reference

Binary Representation

To allow efficiency as well as potential interchange, objects are described by a word sequence which vaguely resembles the in-core representation. The word size is implementation defined, and dumper/loader methods dump/load data as words having the native byte ordering. The format is defined so that any necessary byte-swapping and word-size adjustment can be done as a pre-pass filter to the actual loader. The format is described in detail at the top of od-format.dylan.

The implication of the above is that data-unit files produced from OD-Format are not portable.

Examples

These examples show the bits for some literals, assuming with 32bit word size and big-endian order (which, by the way i386 is not big-endian), and assigning convenient values for the object class-IDs. See the actual code for the real definitions and class IDs.

For example, we could describe a byte-string like this:

    0: <Header? = #b1, Etype = #b00, Subobjects? = #b0, Raw-Format = #x1,
	Class-ID = byte-string-odf-id>
    1: string byte-count
    2[ceiling(byte-count, word-bytes)]: 
       <string-chars: *>

Assuming byte-string-$odf-id is #x666, the actual bits for "Foo" would be:

    #x81000666
    #x00000003
    #x466F6F00

A list has subobjects, but no raw data, so it would look like:

    0: <Header? = #b1, Etype = #b00, Subobjects? = #b1, Raw-Format = #x0,
        Raw-Size = #x00, Class-ID = list-odf-id>

    1[N]: <header of subobject 0, subobject data>
       ...

    M: <Header? = #b1, Etype = #b01, Start-Offset: * = M>

If list-odf-id were #x667, then #("Foo", "Bar") would be:

    0: #x90000667
    1: #x81000666
    2: #x00000003
    4: #x466F6F00
    5: #x81000666
    6: #x00000003
    9: #x42617200
    7: #xA0000007

This demonstrates the nesting of objects, and shows how the end is marked. Note that:

  • "Foo" is represented by exactly the same bits here (it is position-independent), and that

  • The entire sequence is position-independent, since the end-entry has a relative offset, and that

  • The basic subobject protocol doesn't say in advance how many subobjects there's going to be; you have to wait for the end header. This could be avoided for e.g. vectors by considering subobject 0 to be the length.

The Dylan-Dump Module

This module provides ODF dumping and loading routines for all of the primitive types.

The Compile-Time-Values Module

This module exports <ct-value>, which represents a value known at compile time. It also supplies <literal> and a number of subclasses to represent literal strings, integers, lists and other primitive types.

Other compile time values can be found elsewhere. Literal <type> objects are handled by classes in the CType module. Literal <class> objects are handled by classes in the Classes module. Literal <function> objects are handled by classes in the Compile Time Functions module.

The Source Module

d2c was designed to fetch source records from several sources, including flat text files and eventaully code databases. To permit this flexibility, d2c uses a very abstract interface for representing locations in source code.

The <source-location> class is an open, abstract class with no slots. All instances must provide a method for describe-source-location.

The source-location generic takes an arbitrary object as an argument and returns a source location. The mix-in class <source-location-mixin> provides the init-keyword source-location: and a method on source-location which returns the value of the keyword or <unknown-source-location>.

The <source-file> class reads an entire input file into memory and provides access to the contents. It's tightly-coupled to the class <file-source-location>, which represents a location within a source file. Separating these two classes is not feasible. Among other things, this code generates the attractive, multiline displays of exactly where an error occured.

The Tokens Module

This module provides classes and constants for representing tokens and syntax tables.

The Header Module

Dylan source files have RFC 822 headers, similar to those used by Internet e-mail and many other protocols. This module provides support for parsing headers at the top of a file and extracting the values of keywords. White space at the begining and ends of lines is removed automatically.

The Platform Module

This module contains code to parse platforms.descr, which contains descriptions of the various host platforms supported by d2c. A description of each of the parameters appears in platforms.descr itself.

The Errors Module

This module exports a number of classes and functions related to compiler errors and warnings.

The Signature-Interface and Signature Modules

The Signature-Interface module creates a group of names which are actually implemented in the Signature module. These represent the formal parameter and result lists of a Dylan function.

The Names Module

When reporting errors, d2c often needs to provide names for various functions, anonymous local methods and other things found in Dylan programs. This module knows about several different kinds of names and how they should be displayed to the user.

It does not appear that these classes are used in as part of the compiler's own computations. A quick look suggests that this is merely interface and debugging code.

The Definitions Module

The top-of-file comment for this module reads:

Abstract class that all definitions inherit from. A definition is the compilers handle on some run-time value. In addition to the obvious things, definitions exist for things like the type of a variable (when it isn't a compile-time constant).

It appears that named, top-level forms each get a definition. From the comment, it sounds as if the following code produces two definitions:

define variable foo :: run-time-expression() = #f;

One definition refers to the variable itself, and the other refers to the type value computed at initialization time. Is this a correct assumption?

The Variables Module

This module is in change of libraries, modules and namespaces. It maintains the global table of libraries, processes the various clauses in define library and define module, and keeps track of where variable definitions live. The magic Dylan-User modules get set up here as well.

Each <variable> object also provides slots for the associated definition, transformers and constant evaluator. It would appear that this module defines some of the more important data structures in the compiler.

The Policy Module

The Policy represents an optimization policy: a set of parameters used for making tradeoffs between speed, safety and code size. A quick inspection of the code shows that a <policy> object will not survive the dumping and loading processes intact. Do these actually get used?

The CType Module

This module defines the compiler's type model. This is used by the Flow module to represent the types of expressions, and by the Compiler-Optimize library during several phases of optimization.

Most of these types can be represented as Dylan objects. Such types also inherit from <ct-value>, so they can be represented as compile-time values (see the Section called The Compile-Time-Values Module).

All ctypes inherit from <values-ctype>. Single-value types (e.g., the types of variables) are represented by subclasses of <ctype>. Multiple-value types (e.g., the return values of functions) are represented as instances of <multi-value-ctype>.

Various subclasses of <ctype> represent class types, union types and all the standard (and non-standard) limited types supported by d2c. The class <unknown-ctype> represents a type we won't know until runtime. The function empty-type returns a type union with no members. No object can ever be an instance of this type. (Empty types are used extensively when computing the intersections, unions and differences of types.)

Class types are defined in the Classes module.

<multi-value-ctype> is a bit tricky. It represents zero or more required values of known types, zero or more optional values of known types, and an optional #rest list—which may also be typed. (See the source for more information.) The function wild-ctype returns a multi-value type representing zero or more values of any type.

This module supports type intersections, type unions and type differences, primarily for use by the optimizer. All of the operations may be imprecise; if so, the second return value will typically be false.

The intersection of two types is described as:

Return as restrictive a type as we can discover that is no more restrictive than the intersection of Type1 and Type2. The second value is true if the result is exact. At worst, we arbitrarily return one of the arguments as the first value (trying not to return an unknown type).

The union of two types is described as:

Find a type which includes both types. The result is an unknown type if either of the arguments are unknown; otherwise the result is precise.

The difference of two types is described as:

Return our best guess at the type that describes all objects that are in Type1 but not in Type2. If we can't precisely determine this type, then return something more inclusive than it, but never more inclusive than Type1.

Other things in this module include subtype tests, type extent computation and type specifiers. The various "-dispatch" functions have nothing to do with the compiler's support for generic functions—they're just hooks for extending various generic functions defined in this module.

The Transformers Module

Transformers are functions that are attached to function definitions. They serve as a sort of procedural macro in the optimizer. For instance, the do() function is transformed into an ordinary loop construct so the optimizer can bang on it some more. See the cheese module for usage of this.

The Representation Module

This module provides a very abstract interface for choosing data representations in generated code. It currently appears to be implemented by the C-Representation module.

The Classes Module

This module contains the guts of the class system, which is implemented with classes as a subtype of <ctype>. This module also computes class precedence lists, unique class IDs and slot layouts.

The Type-Dump Module

This is an implementation module which contains ODF support for various subclasses of <ctype>.

The C-Representation Module

This modules knows about the different C data types. It also knows about heap representations, direct representations and general representations.

The Compile-Time-Functions Module

From the top-of-file comment:

A ct-function is a kind of compile-time value used to represent a function. ct-functions contain various linkage-related information needed to call the function, but don't reference the FER for the function (e.g. the <function-literal>.) This information is used both by the backend and by the heap builder.

This is also where Peter Housel added the support for callback functions. This module includes support for general entries and generic entries, which are presumably the type-checked and dispatched entry points for functions, respectively.

The Flow Module

These classes define the flow model used by the compiler. Together with the classes defined in the Front module, they make up the intermediate representation used to communicate between the Compiler-Convert library, the Compiler-Front library, the Compiler-Optimize library and the Compiler-CBack library.

Unfortunately, the front-end and the back-end of the compiler use a slightly different version of the intermediate representation. To confuse matters further, both versions are called the "front-end representation" (see the Section called The Builder-Interface Module for more details).

The actual flow model is fairly straightforward. Data-flow classes are defined in data-flow.dylan, and control-flow classes are defined in control-flow.dylan.

The data-flow classes represent variables, constants, operations and assignments. They use a generalization of the traditional three-address form. Expressions are broken down into a series of step-by-step assignments to temporary variables. The right-hand side of an assignment may contain a constant, a variable or a single operation (nesting is not allowed). The operation may take any number of arguments, each of which must be another variable or a constant.

When the flow classes are first created, a particular variable may be assigned to more than once. The optimizer, however, converts all variables to single static assignment (SSA) form. In SSA form, each variable is given a value at creation time and never modified. This restriction simplifies many optimization algorithms considerably.

Prior to SSA conversion, variables are represented by the class <intial-variable>. Each individual assignment to a variable is represented as an <initial-definition>. After SSA conversion, variables are represented by the class <ssa-variable>.

The control-flow model represents groups of assignments as regions. The simplest kind of region is <simple-region>, which represents a linear series of assignments. A <compound-region> holds several regions which are executed in sequence. The various subclasses of <join-region> represent control constructs with non-linear flow patterns. An <exit-region> transfers control to an enclosing block; various subclasses are used to return from functions, exit loops and perform other kinds of funky control flow.

This part of the compiler is well-commented, so you may want to read the source for further details. You'll also want to look at the classes defined in Front module, most of which extend the control- and data-flow model in various useful ways.

The classes <join-operation>, <join-assignment> and <join-region> are currently ignored by the rest of the compiler. In each case the word "join" refers to the Phi function used in traditional SSA form. These classes will be used if anybody ever fixes the optimizer's SSA conversion.