Wednesday, November 21, 2007

Jump!

A fair share of the algorithms implemented by the peephole optimizer has to do with jump optimization. The compiler has already been carefully crafted to produce the best possible code before reaching this pass, but some optimizations may reintroduce small inefficiencies, that must be solved one way or another.
This is a summary of the tasks performed by our optimizer:
  1. Chained jumps detection
A chained jump is a branch instruction that jumps into an unconditional branch. In some pathological cases, the chain may extend through several jumps. When the optimizer detects such a chain, it substitutes all intermediate targets by the target at the end of the chain. We must be careful in order to avoid circular jumps.
  1. Jumps to return
An unconditional branch may land on a return operation code. Execution time is saved by substituting the original branch with a return.
  1. Common branch suffixes
As already explained here, this is a space-saving optimization that detects repeated code at the end of converging executing paths. Since time is more important than space, this pass is executed after jump to return detection.
  1. Unreachable code detection
Code is symbolically executed, marking all instructions reached by some execution path. Then, all non-marked instructions are deleted.
  1. Trivial jumps removal
As a side effect of the previous pass, code is compacted and some trivial jump patterns may be introduced. For instance, we could have a jump to the immediately next instruction in the code stream, or a conditional branch skipping an unconditional branch, which could be substituted by a single inverted conditional branch.
  1. Peephole optimization, properly speaking
Now we scan the instruction stream searching for some short code sequences that can be simplified. For instance, we look for consecutive assignments on the same field, leaving just the last assignment. Consecutive assignments may be generated by automatic transformations, such as the algorithm for generating iterators. If a local variable is written and then read, and if this is the only use of that local variable, we can drop both the read and the write operations. This seemingly silly pattern is frequently generated by for/in loops.
  1. Tail recursion
Then, we look for recursive calls before a return operation code. When some security conditions are met, we scan back the code that pushes the parameters for that call, in order to modify the arguments in the current stack frame, and then we substitute the method call by a jump to the beginning of the method. We never use the tail prefix from the IL instruction set, since it's not efficiently implemented.
  1. Branch operand size tuning
Our last pass assigns offsets for each surviving operations, and computes distances for each remaining jumps. Then, branch codes are changed into short jumps whenever possible. Full size jumps requires five bytes, while a short jump is codified in just two bytes.

Labels: ,

Tuesday, November 20, 2007

Common branch suffixes

Another little enhancement has made its way into the Freya optimizer: common branch suffix detection. This technique detects repeated operation codes in code paths converging in a single point, and moves the branching instruction to take advantage of the common code. It's a trick for reducing code size without affecting its speed, at least, in a direct way. Of course, less code means faster JIT translation and better CPU cache performance.
Let's say we must generate code for this statement:
if condition then
a.m(p1, p2, q)
else
a.m(p3, p4, q)
The translation scheme would be as follows:
    EVAL(condition)
br.false @@else
EVAL(a)
EVAL(p1)
EVAL(p2)
EVAL(q)
callvirt m
br @@continue
@@else:
EVAL(a)
EVAL(p3)
EVAL(p4)
EVAL(q)
callvirt m
@@continue:
It's hard for the compiler to detect the common code in the Abstract Syntax Tree, so this is a task for our peephole optimizer. The optimizer kicks in after each unconditional jump, and scans backwards in parallel, from the branching point and the branch target. The above translation scheme is simplified this way:
    EVAL(condition)
br.false @@else
EVAL(a)
EVAL(p1)
EVAL(p2)
br @@continue
@@else:
EVAL(a)
EVAL(p3)
EVAL(p4)
@@continue:
EVAL(q)
callvirt m
In almost all situations, this change won't affect speed. There is, however, a case where we must be careful: suppose our conditional was the last statement in a method. In that case, the unconditional branch would jump into a return code, and the peephole optimizer would have substituted the jump with a direct return, saving the time consumed by the jump. To prevent this, common branch suffixes are always detected after the jump optimizer has finished its own duty.
About the impact of this optimization in real-world applications, please note that our example can't benefit from Freya's conditional expressions. What's more, the automatically generated code for iterators typically contains many assignments to the state field that can be factored out with our technique.

Labels: ,

Tuesday, February 13, 2007

Field-based properties

Freya already supports field-based properties. If you declare a property, but you don't provide an implementation, the compiler adds automatically a hidden field and implements the property using the new field. Even if the property is declared as read only, you can assign values to the hidden field using the property name, as long as your code is located inside the declaring class or inside another class nested in the declaring class.
I was declaring the hidden field as private, but I have realized that it's better to declare it as an internal field, so any class compiled inside the same assembly could access the field instead of the property. Of course, if the property is read only, any other class still can not write on the field. Another constraint has to do with virtual properties: if the property is declared as virtual, it's not safe to reference the field instead of the property.

Labels: , ,

Tuesday, January 23, 2007

Predefined extensions

Freya will have to implement something like C# 3.0's extension methods. I'm not an extension method's fan, and some details still wait for a decision. Meanwhile, we have added some predefined extension methods to the predefined double type, so you can write this:
property Length: Double;
begin
Result := (x.Sqr + y.Sqr + z.Sqr).Sqrt;
end;
Every method defined in the static System.Math class can be used as an instance method on any System.Double instance. On the other hand, there's no Math.Sqr method at all: it's just an intrinsic method, requiring some compiler magic. Another "intrinsic" method extension: the Ord method applied to a Char expression.
Another addition to the compiler: a "peephole" optimizer. This is an optimizer that works on a IL representation. Right now, the optimizer choose between short and long branch codes, eliminates jumps in cascade, remove some useless data writes (most of them induced by the Result variable) and detects unreachable code.

Labels:

Friday, January 13, 2006

Enhancing code quality

It's an open question whether a compiler writer targeting the CLR should waste some time optimizing generated code. After all, the JIT compiler in .NET 2.0 is almost a miracle of software engineering. In any case, I think it pays, at least when we talk about very simple optimizations with an immediate impact in code.
If you're compiling a Pascal-like language as Freya, function result assignment may introduce extra bytes in the code stream. Consider this simple access method:
// From a generic stack implementation.
property Top: X;
begin
Result := Items[Count - 1];
end;
This was the output from the initial compiler implementation, as reported by Reflector:
  // Code size: 22 bytes
.maxstack 3
.locals init(!0 local1)
ldarg.0
ldfld !0[] Stack`1::Items
ldarg.0
ldfld int32 Stack`::Count
ldc.i4.1
sub
ldelem.any !0
stloc.0
ldloc.0
ret
What's that local1 local variable? That's simply a direct and naive implementation of the old and good Result! As you can see, there's a silly store/load code sequence at the end of the method.
I've just added to the Freya compiler a very simple and fast algorithm to detect proper function returns, in true C/C# fashion. Now, the compiler generates this code for the same Freya method:
  // Code size: 20 bytes
.maxstack 3
ldarg.0
ldfld !0[] Stack`1::Items
ldarg.0
ldfld int32 Stack`::Count
ldc.i4.1
sub
ldelem.any !0
ret
There's no need now for a local variable, and the code stream is two bytes shorter. How will this affect the JIT output? Well, it's highly probable that the JIT compiler would have detected the redundant sequence in the first example and removed it. However, with the new compiled code, the JIT compiler has less work to perform and the application will load faster. You should note that our algorithm is able to detect the return pattern in code like this:
method Silly(Value: Integer): String;
begin
Console.WriteLine(Value);
case Value of
v1..v2: Result := 'Whatever';
v3: Result := 'Oops';
v150..v300:
if Value = 200 then
Result := 'Yikes'
else
Result := 'Uh-oh';
else
Result := 'That''s all...'
end
end;
Let's see another simple and useful optimization. How would you translate this?
if Value <> 0 then
DoSomething;
My initial implementation would have missed the fact we are comparing for equality with the zero constant, and would have generated something like this:
  // Load Value...
ldc.i4.0
beq @@1
// DoSomething...
@@1:
Actually, this is a lot better, and it's just what we're doing now:
  // Load Value...
brtrue @@1
// DoSomething...
@@1:
Despite the fact that the CLR offers a "decent" type for Boolean, IL code still allows you to use a non null constant as an equivalent of true. In this case, we're saving just a byte. But remember: the shorter code, the faster it loads...

Labels: ,