Wednesday, June 17, 2009

Trust but verify




When we started True Basic in 1983, none of us had real microprocessor experience. A couple of us had worked on minis and a cool bit-sliced machine, but we had no mainstream computer chip experience. So, we immediately went out and got a Z8000 based Unix box for development. Seriously. Then we bought an IBM PC and before the end of the year, we had an Apple Mac and soon an Apple Lisa.

Our strategy was to develop portable software on the Unix box and then load it onto various forms of personal computers. It wasn't obvious at the time that the Intel/IBM architecture was going to be so successful and we also had dreams of getting our software to run on the Apple II which had good numbers back then, plus the Motorola 68000 machines and whatever came next.

The software architecture was pretty standard stuff: Write almost everything in a high-level language, compile this into a byte code for an imaginary machine, then interpret this machine with an interpreter specific to each computer we want to support. Everything but the interpreter would be common to all machines, and the interpreter could be written in either a high-level language, or in assembler if speed was a problem.

Since each machine had requirements for data layouts etc that would impact performance, there was a linker that knew about the particular target machine and set up auxiliary tables as needed. The linker was written in a high-level language, too. So the only thing that, in theory, needed to be changed when porting to a new machine was the interpreter.

So we got to design that ideal machine for ANSI BASIC and then write several interpreters for it. David Pearson architected a storage-to-storage machine with complex effective addressing. There were other language systems at the time that used a stack-based machine, notably the UCSD P-system, but I think our design led to more compact code though it required a larger interpreter. It was an engineering guess that we would save space for the total system doing this.

For example, if you had the statement:


let a= b+c

with a stack machine this would turn into something like:

push b
push c
add
pop c

Assuming that an opcode would take 1 byte and an address would take 1 byte also, this would add up to 7 bytes. If you did the same thing with a memory-to-memory instruction set, it would look like:

add b,c,a

and this would be only 4 bytes long.

With the ideal machine target settled, we could immediately write an interpreter for it, build a simple compiler and development environment and try stuff out. And so we did. We picked a small subset of the eventual language to implement, wrote the interpreter in Lattice C, and implemented the compiler and IDE (development environment or UI) in, listen carefully, True Basic. Since we didn't have a True Basic compiler (yet), anything written in True Basic was actually written in the form of Basic that was on the Unix box. But only that subset of the language that existed on both the Unix box and True Basic was used.

This version had lots of limitations, by design. It was written in "near memory model" so only 64K bytes of memory could be used for data. This simplified our compiler and interpreter, but put a serious cap on the size of programs that could be run. And because the interpreter had just used the C memory manager, if a request was rejected there was no way to compact memory and try again. But this version was completed in a couple of months and allowed other people to try the language. It also provided us with a demo version to show to investors or publishers.

For the real version, we had to start over and write the real compiler for the whole language, and the interpreter for the IBM PC had to be written in assembler. C compilers didn't do a great job with optimizing back then, and we needed the speed to compete with other Basics in the marketplace.

Since I was going to be programming in assembler, I read the Intel manuals from cover to cover. I reverse engineered the rules for how to calculate the cycles taken per instruction rather than just looking each one up, so I had a mental model of what was costly and what wasn't. Then I dove in and wrote some code and timed it. Lo and behold, the IBM PC was nowhere near as fast as the Intel manual would lead you to believe. This baffled us for a while.

Then we noticed a little fine print in the back of the manual. It said the timings all assumed the instruction prefetch buffer was full. On the 8086 and 8088, there was a 6 byte lookahead buffer that would fetch the next instructions if there was memory bandwidth free. This might happen on 8086 machines with fast RAM, but on an 8088 with an 8 bit bus and slow RAM, this would rarely happen.

In the end, the best model for speed on the IBM PC was to count the number of bytes that were read or written and multiply by 1 microsecond. About the only time where the prefetch buffer could get ahead was when a divide operation happened.

I created a list of one byte, two byte and three byte instructions and memorized them. In particular, if I could use a one byte instruction in place of a two or three byte, I did. I pretty much ignored Intel's timings.

Lesson learned: Trust but verify. Or if you have created a mental model, verify it if possible.

I plan in future posts to cover some of the tricks or coding gems from this interpreter, and perhaps some of the cool designs in this language system. I wasn't the designer, so I can say this. It was ported to several machines including the Amiga, Atari, Apple Mac, and a Wang VS machine (something like IBM 360 with a stack). It was later ported to 32 bit Windows, but I had left the company by the time that happened.

1 comment:

Stanley_Krute said...

Ah, the luxury/joy of writing something in assembly lang., close to the metal.

I remember being quite sad when IBM/MS chose the Intel processors rather than the Motorolas, since the 68k instruction set was much more orthogonal and thus easier to learn/use for those of us w/ weak memory skills.

My first 6 months of Mac programming, back in 1984, were in assembly language. Pretty funny to think of that, esp. in this modern world of php/javascript/html website prog'ing.

-- stan

Post a Comment