A fair share of the algorithms implemented by the peephole optimizer has to do with jump optimization. The compiler has already been carefully crafted to produce the best possible code before reaching this pass, but some optimizations may reintroduce small inefficiencies, that must be solved one way or another.
This is a summary of the tasks performed by our optimizer:
- Chained jumps detection
A chained jump is a branch instruction that jumps into an unconditional branch. In some pathological cases, the chain may extend through several jumps. When the optimizer detects such a chain, it substitutes all intermediate targets by the target at the end of the chain. We must be careful in order to avoid circular jumps.
- Jumps to return
An unconditional branch may land on a return operation code. Execution time is saved by substituting the original branch with a return.
- Common branch suffixes
As already explained here, this is a space-saving optimization that detects repeated code at the end of converging executing paths. Since time is more important than space, this pass is executed after jump to return detection.
- Unreachable code detection
Code is symbolically executed, marking all instructions reached by some execution path. Then, all non-marked instructions are deleted.
- Trivial jumps removal
As a side effect of the previous pass, code is compacted and some trivial jump patterns may be introduced. For instance, we could have a jump to the immediately next instruction in the code stream, or a conditional branch skipping an unconditional branch, which could be substituted by a single inverted conditional branch.
- Peephole optimization, properly speaking
Now we scan the instruction stream searching for some short code sequences that can be simplified. For instance, we look for consecutive assignments on the same field, leaving just the last assignment. Consecutive assignments may be generated by automatic transformations, such as the algorithm for generating iterators. If a local variable is written and then read, and if this is the only use of that local variable, we can drop both the read and the write operations. This seemingly silly pattern is frequently generated by for/in loops.
- Tail recursion
Then, we look for recursive calls before a return operation code. When some security conditions are met, we scan back the code that pushes the parameters for that call, in order to modify the arguments in the current stack frame, and then we substitute the method call by a jump to the beginning of the method. We never use the tail prefix from the IL instruction set, since it's not efficiently implemented.
- Branch operand size tuning
Our last pass assigns offsets for each surviving operations, and computes distances for each remaining jumps. Then, branch codes are changed into short jumps whenever possible. Full size jumps requires five bytes, while a short jump is codified in just two bytes.