Assembly Code

Next: Templates Up: Performance Previous: Virtual Methods

Assembly Code

When dealing with ``bare iron,'' some assembly code is inevitable. There is, for example, no reasonable way to implement thread switching entirely in C or C++, and in most processors interrupt and trap handlers must have assembly code to save the context of an interrupted thread and setup a fresh C/C++ calling sequence.

However, assembly code should typically be kept to a minimum. A human can do a good job of optimizing a small stretch of assembly code, but as the code grows, the human capacity to manage resource allocation becomes overwhelmed and the compiler performs better.

The paradox is that short assembly functions are more likely to be inefficient if placed in seperate source files, as the call and return overhead starts to overwhelm. What we really want is a way to write inline assembly code. Look at the following example:

inline int isr_hot_flag()
{
      register unsigned tmp;
      asm("modpc 0, 0, %0" : "=r" (tmp));
      return tmp & 0x2000;
}

This code returns true if the caller is running in an interrupt handler on an i960. The ``modpc'' instruction takes around 14 clock cycles to execute (it is slow) but that is not too bad. The call and return instructions on the i960Jx each consume 6 cycles and also push a register set, leading to at least 12 cycles for the call/return pair. Putting this function in a seperate source file would double the execution time of the function and may cause a register cache spill as well. Inlining this function is therefore rather important and powerful.

The benefits do not stop there. The GNU compiler syntax for inline assembly allows the programmer to match up registers and supply constraints that allow the C/C++ compiler to include the code in the optimization phase. The following degenerate case:

int foo() { return 0 && isr_hot_flag(); }

obviously leads to the following optimized i960 assembly code:

_foo__Fv:
      mov 0,g0
      ret

More complex situations are possible, but the point here is that the inline assembly can and should be included in the optimization passes of the compiler for the best results. A more complex example for the i386[7] works as follows:

inline int call_host1(int sys, int parm)
{
      asm ("int \$0x80\n"
               : "=a" (sys)
               : "0" (sys), "b" (parm) );
      return sys;
}

The previous code makes a system call by putting the parameter in the ebx register and the system call number in the eax register, and calling ``int $0x80'' to trap to the system. The compiler now knows how to allocate the eax and ebx registers around this assembly statement and can use that knowledge when optimizing register allocation around it. It can also eliminate the whole mess if the result is not at all used.

This point doesn't have much to do with object oriented design (except that you can write methods in assembly code) but has everything to do with writing the low-level code in C++. If assembly code could not be inlined, an efficient implementation would necessarily pull more code into the assembly files to cut down on the cost of function call overhead. At that point, the C++ compiler would become more a hindrance then a help.

Including assembly code inline, and using constraints to control the optimizer, eliminates much of the interface overhead between C++ and assembly and gives the programmer the best of the C++ and assembly worlds.

Next: Templates Up: Performance Previous: Virtual Methods

Stephen Williams
Sun May 4 15:28:26 PDT 1997