Wednesday, November 21, 2007

Jump!

A fair share of the algorithms implemented by the peephole optimizer has to do with jump optimization. The compiler has already been carefully crafted to produce the best possible code before reaching this pass, but some optimizations may reintroduce small inefficiencies, that must be solved one way or another.
This is a summary of the tasks performed by our optimizer:
  1. Chained jumps detection
A chained jump is a branch instruction that jumps into an unconditional branch. In some pathological cases, the chain may extend through several jumps. When the optimizer detects such a chain, it substitutes all intermediate targets by the target at the end of the chain. We must be careful in order to avoid circular jumps.
  1. Jumps to return
An unconditional branch may land on a return operation code. Execution time is saved by substituting the original branch with a return.
  1. Common branch suffixes
As already explained here, this is a space-saving optimization that detects repeated code at the end of converging executing paths. Since time is more important than space, this pass is executed after jump to return detection.
  1. Unreachable code detection
Code is symbolically executed, marking all instructions reached by some execution path. Then, all non-marked instructions are deleted.
  1. Trivial jumps removal
As a side effect of the previous pass, code is compacted and some trivial jump patterns may be introduced. For instance, we could have a jump to the immediately next instruction in the code stream, or a conditional branch skipping an unconditional branch, which could be substituted by a single inverted conditional branch.
  1. Peephole optimization, properly speaking
Now we scan the instruction stream searching for some short code sequences that can be simplified. For instance, we look for consecutive assignments on the same field, leaving just the last assignment. Consecutive assignments may be generated by automatic transformations, such as the algorithm for generating iterators. If a local variable is written and then read, and if this is the only use of that local variable, we can drop both the read and the write operations. This seemingly silly pattern is frequently generated by for/in loops.
  1. Tail recursion
Then, we look for recursive calls before a return operation code. When some security conditions are met, we scan back the code that pushes the parameters for that call, in order to modify the arguments in the current stack frame, and then we substitute the method call by a jump to the beginning of the method. We never use the tail prefix from the IL instruction set, since it's not efficiently implemented.
  1. Branch operand size tuning
Our last pass assigns offsets for each surviving operations, and computes distances for each remaining jumps. Then, branch codes are changed into short jumps whenever possible. Full size jumps requires five bytes, while a short jump is codified in just two bytes.

Labels: ,

Tuesday, November 20, 2007

Stating the obvious

Should we always state the obvious? Interface types can only declare methods, properties and events (and assertions, since we're talking about Freya!). Fields, in any case, are not allowed inside an interface declaration.
Properties, however, are designed most of the times to resemble fields. Why should we stick to the property keyword when declaring properties for an interface type?
IScene = interface
Sampler: ISampler; readonly;
Camera: ICamera; readonly;
Root: IShape; readonly;
Lights: Array[ILight]; readonly;

method Render: PixelMap;
method Render(Row, Col: Integer): Pixel;
end
Of course, if you feel uneasy about dropping the keyword, you can still use it. Notice that there's no possible confusion, even with the declaration of Lights: that's a property returning an array, not an array property.
By the way, the above code fragment shows another omission: that of the final semicolon after the type declaration. Freya syntax has been relaxed to use optional semicolons in several places: after the end keyword in a complex type declaration and after the statement block in a method implementation. We're not going as far as Eiffel, where semicolons are optional almost everywhere, but our new syntax makes easier manual code translation from C# into Freya.

Labels: , , ,

Common branch suffixes

Another little enhancement has made its way into the Freya optimizer: common branch suffix detection. This technique detects repeated operation codes in code paths converging in a single point, and moves the branching instruction to take advantage of the common code. It's a trick for reducing code size without affecting its speed, at least, in a direct way. Of course, less code means faster JIT translation and better CPU cache performance.
Let's say we must generate code for this statement:
if condition then
a.m(p1, p2, q)
else
a.m(p3, p4, q)
The translation scheme would be as follows:
    EVAL(condition)
br.false @@else
EVAL(a)
EVAL(p1)
EVAL(p2)
EVAL(q)
callvirt m
br @@continue
@@else:
EVAL(a)
EVAL(p3)
EVAL(p4)
EVAL(q)
callvirt m
@@continue:
It's hard for the compiler to detect the common code in the Abstract Syntax Tree, so this is a task for our peephole optimizer. The optimizer kicks in after each unconditional jump, and scans backwards in parallel, from the branching point and the branch target. The above translation scheme is simplified this way:
    EVAL(condition)
br.false @@else
EVAL(a)
EVAL(p1)
EVAL(p2)
br @@continue
@@else:
EVAL(a)
EVAL(p3)
EVAL(p4)
@@continue:
EVAL(q)
callvirt m
In almost all situations, this change won't affect speed. There is, however, a case where we must be careful: suppose our conditional was the last statement in a method. In that case, the unconditional branch would jump into a return code, and the peephole optimizer would have substituted the jump with a direct return, saving the time consumed by the jump. To prevent this, common branch suffixes are always detected after the jump optimizer has finished its own duty.
About the impact of this optimization in real-world applications, please note that our example can't benefit from Freya's conditional expressions. What's more, the automatically generated code for iterators typically contains many assignments to the state field that can be factored out with our technique.

Labels: ,

Saturday, November 17, 2007

Ranges

Range expressions may appear, in Freya, in three different contexts:
  1. As case labels.
  2. In for statements.
  3. As the right operand of the in operator.
Of course, the original source was the case statement:
case expression of
0, 3..20, 23: Console.WriteLine('Howdy');
50..6000: Console.WriteLine('Silly');
end;
Please note that Freya, imitating C#, also supports case statements with string expressions, but in that case, ranges are not allowed, due to the underlying implementation, which depends on hash dictionaries. That's not a satisfactory situation. It could be acceptable for C#, where there're no ranges in its switch, but not for Freya. The solution, perhaps, could be the adoption of a new data structure for implementing ranges. But that's another story.

Membership

From the very first drafts, Freya featured an in operator, designed to work with collections. An expression like this:
x in A
is translated this way:
ICollection[T](A).Contains(x)
where T is the compiling time type of x. There's even a sort of "contains pattern", in the line of the "enumerable pattern", which allows any type with a suitable Contains method to be used as the right operand of the in operator.
We evaluated then several techniques for extending the in operator so it could act on ranges. The obvious technique was allowing ranges inside the brackets of array literals, imitating Pascal sets:
x in [1, 2, 5..10, 16]
After some tests, we discarded the idea. Implementing Pascalian bit sets was not an option: it would have required a core Freya assembly, and that's a scary idea. Another problem had to do with the places you could use an array literal with ranges inside. My intuition was crying aloud that the way was far from adding new data structures, like bitsets or extended enumerable interfaces.
Right now, I think ranges must exists as themselves in the language. The first hint came from the for statement. One day, I found myself writing loops like this:
for i in low .. high do
Console.WriteLine(i);
Of course it is a cheap syntactic variant of the ascending numerical for, but there was another possible interpretation: the range expression was an iterator expression, i.e., a IEnumerable[T] expression. Why shouldn't we allow expressions like this?
if i in low..high and
j not in low..high then ...
Each range check is translated as two comparisons. For instance:
// This expression...
x in e1 .. e2
// is translated as
x >= e1 and x <= e2
// and this one...
x + 1/x in e1 .. e2
// is translated as:
using temp := x + 1/x do
temp >= e1 and temp <= e2
As a side effect, we now must allow in with an IEnumerable[T] right operand... expanding it as an inline search loop. But this variant is not implemented yet.

The existence test

There's an interesting and natural variant of the membership test:
if * in collection then
// The collection is not empty.
You can also use the negated variant:
if * not in collection then
// The collection is not empty.
In two of the possible cases, the implementation is trivial. For instance, if the existence test is applied to a range, you have a simple comparison involving the boundaries:
// This expression...
* in e1 .. e2
// is translated as
e1 <= e2
When the right side implements the IList[X] generic interface, the test is based on the Count property:
// This expression...
* in list
// is translated as
list.Count > 0
However, there's a case where the existence test is really useful. The right operand may only implement the so called enumeration pattern, as in the foreach statement, or one of the IEnumerable interfaces. Let's say the right operand implements IEnumerable[X]:
// This expression...
* in enumerable
// is translated as this expression:
using e := enumerable.GetEnumerator do
e.MoveNext
Please note, first, that we have here a using expression, not a statement. Second: we need a using expression because the value returned by GetEnumerator may implement IDisposable. In that case, we need a resource protection block to dispose the enumerator after the test. Actually, our new existence test is saving you from writing a for/in statement:
var result := false;
for v in enumerable
begin
result := true;
break
end;

Labels: ,