Götterdämmerung

Ben Goetter's unnecessarily apocalyptic weblog.


Tue, 14 Nov 2006

The 2.0 VM

I've spent the last couple of weeks hand-compiling Scheme programs, working out the details of the VM for the next release. I've ended up with a VM strongly reminiscent of the CEK abstract machine.

While this is a work in progress, it'll do me good to write it up here.

Registers

CV
The current code vector. All branch operations (JUMP, JUMPFALSE) are relative within this.
IP
The instruction pointer, an offset within CV denoting the next instruction to decode and execute. Think of this as R15.
DV
The current data vector, a pool of read-only values. All literal data references (LITERAL) are relative within this.
VAL
The evaluated value of the last expression (in Scheme terms). This is used as a kind of general accumulator, a R0 type of register.
ENV
The top of a chain of frames implementing the current lexical environment.
FRAME
One or more partial calculations in progress within the current environment, implemented as a chain of partially defined frames. Taken together with ENV, this is R13 (though we're not a stack machine).
CONT
The top of a chain of records tracking the continuations of pending calls. Each record comprises an IP, a saved ENV and FRAME, and a link to the next record. Think of this as R14. This would be the callstack, were it not for call-with-current-continuation.

One advantage of keeping the activation chain separate from the process callstack is having Windows system calls interoperate nicely with reentrant continuations. In Pocket Scheme 1.x, FFI wndprocs and the interpreter share a common stack, which means that any windows running in the Scheme thread (i.e. any windows with a wndproc written in Scheme) must take care never to reenter themselves accidentally.

Instructions

FRAME size
Creates a new frame capable of storing size values, linking it to the existing value of FRAME, then setting FRAME to this new frame.

A frame is a multipurpose structure, responsible for saving the values of local variables, saving temporary computation results, and finally passing a set of parameters to a procedure. A frame starts life with a FRAME instruction, and has its elements set with FRAMESET instructions. It may stay in FRAME for a while, possibly as the tail of a chain of frames as created by subsequent FRAME operations, but eventually it will move into the ENV chain via PUSHENV, CALL, or TAILCALL. A frame ends its life via RETURN or POPENV, both of which immediately recycle the frame for subsequent FRAME calls, or a CALL or TAILCALL that makes it unreachable.

A frame is recycled by returning it to a pool from which subsequent FRAME instructions will reallocate it. Operations that break the stack access pattern of a frame will mark it as nonrecyclable, in which case the frame does not return to the pool, but instead is left for eventual GC. The CLOSURE instruction will mark every frame in the current ENV as nonrecyclable. Likewise, the procedure call-with-current-continuation will mark every frame in both ENV and CONT as nonrecyclable, so that those frames last for the life of the reified continuation.

For this reason, the compiler attempts to elide CLOSURE operations wherever possible (e.g. tail-calls to procedures defined via let and letrec).

FRAMESET index
Sets an element of the topmost frame in FRAME to the current contents of VAL. Used for apply and let, but not letrec.
CALL
Applies the closure in VAL to the arguments in the topmost frame. The current ENV, the remainder of the partial FRAME chain, and IP+1 are all saved in a new record linked to the top of CONT, from where they will be restored by the next RETURN operation. ENV is set to the environment saved in the closure, then extended by the topmost frame. FRAME is cleared. IP is set to the first instruction in the closure's code.
RETURN
Sets IP, FRAME, and ENV to the last values saved by CALL, popping the topmost element of CONT. The topmost frame of the previous ENV is immediately recycled, if possible.

Issue: if VAR-ARITY has extended the environment, RETURN should recycle two frames.

TAILCALL
Applies the closure in VAL to the arguments in the topmost frame, using tail-position semantics. ENV is set to the environment saved in the closure, then extended by the frame in FRAME, which should be the only frame in the chain. IP is set to the first instruction in the closure's code. FRAME is cleared. No new record is added to CONT.
PUSHENV
Transfers the topmost frame from FRAMES to ENV, extending the lexical environment with the values stored therein. If every element in the frame has been set, this is a Scheme let; otherwise, it is a Scheme letrec.
POPENV
Removes the topmost frame from ENV, immediately recycling it if possible.
JUMP offset
Adjusts the IP by the specified offset.

I always write JUMP with a symbolic offset, just as I do JUMPFALSE and CLOSURE. A LABEL pseudo-instruction gives a name to the target.

JUMPFALSE offset
If the contents of VAL are false, adjust the IP by the specified offset. Otherwise, execution will continue at IP+1.
CLOSURE offset
Creates a new closure object, saving the current ENV, the current CS, and the current IP, adjusted by the specified offset. VAL is set to the new closure.

Closing over the environment marks every frame in the current environment as nonrecyclable.

LITERAL offset
Loads VAL with the value found in DV at the specified offset.

I always write LITERAL with the explicit datum, just as I do GLOBDEF, GLOBSET, and GLOBREF.

GLOBDEF offset
Defines a new entity in the global namespace, named by the symbol found in DV at the specified offset, and taking the current VAL.
GLOBSET offset
Sets the global element named by the symbol found in DV at the specified offset to the current VAL.
GLOBREF offset
Sets VAL to the value of the global element named by the symbol found in DV at the specified offset.
LEXSET depth offset
Sets the element in the current environment specified by the depth and offset (de Bruijn indices) to the current VAL.
LEXREF depth offset
Sets VAL to the element in the current environment (ENV) specified by the depth and offset.
FIXED-ARITY arity
Saves meta-information about a procedure. Currently, tests topmost frame in ENV (as applied by CALL or TAILCALL) to compare its length to the specified arity. Raises a runtime error if the two quantities differ.

The intent is for all -ARITY instructions eventually to emit procedure metadata that is interpreted by CALL instead of performing the check inline. This would allow a better error message in the case of a mismatch in a tail call, allow a direct jump to the callsite for a case-lambda, and for calls to statically determined sites, would allow arity and type checks to be hoisted higher in the calltree.

MIN-ARITY arity
Saves meta-information about a procedure. Currently, tests topmost frame in ENV (as applied by CALL or TAILCALL) to compare its length to the specified minimum arity. Raises a runtime error if the specified arity is greater than the number of parameters passed.
VAR-ARITY offset
Skips the first offset parameters, copying the remaining passed parameters from the topmost frame in ENV into a fresh list. Extends ENV with a new frame containing this new list value. Raises a runtime error if the offset is greater than the number of parameters passed.


posted at: 15:34 | path: /pscheme | permanent link to this entry


Powered by blosxom.
Based on a true story.
Syndicate this, if you dare.
Copyright 2006, Ben Goetter. All rights reserved.