;

The signum function returns that sign of the argument. There are several different definitions, but a common one can be written in C as

int Signum( int x )

{

if( x < 0 ) return -1;

if( x == 0 ) return 0;

return 1;

}

In x86 instructions, this can be done with:

CDQ ; (1) x (eax) is sign extended into edx

NEG EAX ; (2) negate x (carry set if x>0)

ADC EDX,EDX ; (2) add with carry either -1 or 0

; result in edx

Not bad for 5 bytes of code.

This code was discovered by Henry Massalin and documented in his paper

Superoptimizer: a look at the smallest program published in 1987. This short 5 page paper had a number of interesting ideas, but the major for generating small functions was:

1) Generate all reasonable functions of size 1, 2, 3... bytes long,and

2) Eliminate the ones that clearly don't compute the same value as the function needed.

The first step is relatively easy, but you have to eliminate wild jumps or memory references to arbitrary addresses. From my reading, he just eliminated all jumps and memory references and only worked on values in registers.

The second step was accomplished by sending test vectors of values to the known correct function (e.g. signum function above) and comparing the results with the generated function. Since most generated functions will be wrong, the chances of generated function getting the correct answer for all the test vector is very small. Still, the author examined that generated function if it passed all the test vectors.

The test vectors could be hand generated (if you know a lot about the function), or generated randomly. Obviously, if the user generates them it takes more time initially, but that may speed up the process later. I suspect for many functions, generating random numbers is good enough.

There have been many follow-up papers to this. While it was easy to minimize the code size, with a proper CPU model of timings, the program could also compute minimum time routines. And since the number of different routines grows exponentially with the number of instructions, you will need a lot of computing time to generate and test routines greater than 10-15 bytes long without some significant changes to the generator.

Massalin mentioned a not so obvious usage: giving feedback to the assembly language programmer. He mentioned that he used it to write a 500 byte

*printf*routine. New assembly language is not written much anymore, but it still comes up. For example, Excel's recalc engine is written in 32 bit x86 assembler and if its performance wasn't good enough, this type of optimization could come in handy.

There is a parallel between this idea and the idea of testing two or more implementations of the same function. In testing, you may have a known good model for what the code is supposed to do and the implementation under test. In this case, you take your test vector, run it through both your model and the implementation being tested and compare the results. If you are lucky, by definition the results should be the same. The model may be correct but quite slow, but this is fine for testing.

This is one of the ways to test graphics drivers. The model for every operation may be easy enough to write but way too slow for real systems while the driver is optimized for speed both because of the code and the graphics chip. But having one model of what is right allows you to test many different graphics drivers and chips.

One can even use this idea to test computer languages if you have more than one implementation. All you have to do is know how to generate valid programs, and then let the different implementations run them. If the results differ, you need to investigate. There are many limitations to this scheme, but it can improve the testing over hand coded tests. More about this later...