Ben Goetter's unnecessarily apocalyptic weblog.

Sun, 26 Mar 2006

The Iota discovery

It took me a day to locate a suitably burly debugger, install it, and build pscheme with it. This entailed installing an almost-expired beta of VS 2005 (which, incidentally, looks like a very nice product—I'll save my pennies for a copy), then installing a matching WM5 SDK, and then endless fussing: first to get pscheme to build; then to get the debugger talking to the device, working around bugs in the beta; then to make the iota bug repro; and finally to capture the bug. A productive 24 hours, really, in the same sense than running ten miles on a stationary treadmill is running a great distance.

Turns out that I didn't break anything: the bug's been there for the last year, since I released 1.2. (Longer, really. I wrote the code in 2003.) And the underlying flaw in the garbage collector that creates the bug has been there since the last millenium.

The pscheme garbage collector uses the pointer-reversing D-S-W (Deutsch-Schorr-Waite) algorithm to perform the mark phase of a mark-sweep in a constrained amout of stack. (This design originated from running on a very old version of Windows CE that limited its applications to 58Kb of staack.) However, I was lazy on that day in 1998 and didn't D-S-W when marking vectors (and continuations, which in pscheme terms are implemented as a sort of degenerate vector), simply recursing over each element instead. Also, since 1.2 pscheme has used chained continuations to allow it to evaluate arbitrarily complex expressions in a constrained runtime stack: whenever it exhausts its runtime stack, it saves the current continuation, then resets the stack, restoring the previous continuation as necessary to resume computation from that point. Finally, debug builds drastically reduce the size of these stack segments in order to exercise the chaining code.

Taken all together, a call to Iota on a debug build generates a large number of stack-segment continuations. And those continuations contain references to other continuations in the stack, which in turn refer to others, and so on, and so on. The garbage collector calls itself recursively when marking this sequence. And the garbage collector is the only part of pscheme that recurses without chaining continuations. So eventually a few thousand stack segments build up, and pscheme simple-mindedly tries to mark them recursively. Boom, stack fault.

The right fix for this is to finish the job that I started in 1998 and do the D-S-W pointer reversing trick within vectors and continuations. That, however, may have to wait until after 1.3. Short term, I'll turn off the debug-mode stack-swap exercising code that was creating a new continuation with every Kb of runtime stack, and I'll add some extra stack discipline to the mark phase so that it fails more gracefully in the face of (iota 0 1000000) or whatever.

Speaking of stack discipline, here's a happy discovery: WM5 lets exception handlers capture stack faults! In previous releases of Windows CE, any stack fault in a thread would terminate the entire application, exception handling notwithstanding.

posted at: 22:09 | path: /pscheme | permanent link to this entry

Powered by blosxom.
Based on a true story.
Syndicate this, if you dare.
Copyright 2006, Ben Goetter. All rights reserved.