Wednesday, November 21, 2007

Jump!

A fair share of the algorithms implemented by the peephole optimizer has to do with jump optimization. The compiler has already been carefully crafted to produce the best possible code before reaching this pass, but some optimizations may reintroduce small inefficiencies, that must be solved one way or another.
This is a summary of the tasks performed by our optimizer:
  1. Chained jumps detection
A chained jump is a branch instruction that jumps into an unconditional branch. In some pathological cases, the chain may extend through several jumps. When the optimizer detects such a chain, it substitutes all intermediate targets by the target at the end of the chain. We must be careful in order to avoid circular jumps.
  1. Jumps to return
An unconditional branch may land on a return operation code. Execution time is saved by substituting the original branch with a return.
  1. Common branch suffixes
As already explained here, this is a space-saving optimization that detects repeated code at the end of converging executing paths. Since time is more important than space, this pass is executed after jump to return detection.
  1. Unreachable code detection
Code is symbolically executed, marking all instructions reached by some execution path. Then, all non-marked instructions are deleted.
  1. Trivial jumps removal
As a side effect of the previous pass, code is compacted and some trivial jump patterns may be introduced. For instance, we could have a jump to the immediately next instruction in the code stream, or a conditional branch skipping an unconditional branch, which could be substituted by a single inverted conditional branch.
  1. Peephole optimization, properly speaking
Now we scan the instruction stream searching for some short code sequences that can be simplified. For instance, we look for consecutive assignments on the same field, leaving just the last assignment. Consecutive assignments may be generated by automatic transformations, such as the algorithm for generating iterators. If a local variable is written and then read, and if this is the only use of that local variable, we can drop both the read and the write operations. This seemingly silly pattern is frequently generated by for/in loops.
  1. Tail recursion
Then, we look for recursive calls before a return operation code. When some security conditions are met, we scan back the code that pushes the parameters for that call, in order to modify the arguments in the current stack frame, and then we substitute the method call by a jump to the beginning of the method. We never use the tail prefix from the IL instruction set, since it's not efficiently implemented.
  1. Branch operand size tuning
Our last pass assigns offsets for each surviving operations, and computes distances for each remaining jumps. Then, branch codes are changed into short jumps whenever possible. Full size jumps requires five bytes, while a short jump is codified in just two bytes.

Labels: ,

Tuesday, November 20, 2007

Common branch suffixes

Another little enhancement has made its way into the Freya optimizer: common branch suffix detection. This technique detects repeated operation codes in code paths converging in a single point, and moves the branching instruction to take advantage of the common code. It's a trick for reducing code size without affecting its speed, at least, in a direct way. Of course, less code means faster JIT translation and better CPU cache performance.
Let's say we must generate code for this statement:
if condition then
a.m(p1, p2, q)
else
a.m(p3, p4, q)
The translation scheme would be as follows:
    EVAL(condition)
br.false @@else
EVAL(a)
EVAL(p1)
EVAL(p2)
EVAL(q)
callvirt m
br @@continue
@@else:
EVAL(a)
EVAL(p3)
EVAL(p4)
EVAL(q)
callvirt m
@@continue:
It's hard for the compiler to detect the common code in the Abstract Syntax Tree, so this is a task for our peephole optimizer. The optimizer kicks in after each unconditional jump, and scans backwards in parallel, from the branching point and the branch target. The above translation scheme is simplified this way:
    EVAL(condition)
br.false @@else
EVAL(a)
EVAL(p1)
EVAL(p2)
br @@continue
@@else:
EVAL(a)
EVAL(p3)
EVAL(p4)
@@continue:
EVAL(q)
callvirt m
In almost all situations, this change won't affect speed. There is, however, a case where we must be careful: suppose our conditional was the last statement in a method. In that case, the unconditional branch would jump into a return code, and the peephole optimizer would have substituted the jump with a direct return, saving the time consumed by the jump. To prevent this, common branch suffixes are always detected after the jump optimizer has finished its own duty.
About the impact of this optimization in real-world applications, please note that our example can't benefit from Freya's conditional expressions. What's more, the automatically generated code for iterators typically contains many assignments to the state field that can be factored out with our technique.

Labels: ,