The Freya Programming Language

Sunday, September 07, 2008

Enter Lyra

Let me state this loud and clear: I'm proud of Freya and the Freya compiler. I think my goals, or almost all of them, has been achieved: there's a working implementation of a powerful general programming language, running on a powerful interesting framework. What's more: I have great plans for Freya, including the addition of some interesting features not found in similar languages.

There's a catch, however: Freya is a complex language, requiring a complex implementation. And I'm virtually alone, by my own decision, developing the language, the compiler and a simple IDE for testing... and that shows. For instance: almost all relevant features from C^# 2.0 has already been included and implemented in Freya, and we already have some features from C^# 3.0, as extension methods, and object and collection initializers. Nevertheless, there are some equally important features still missing, such as anonymous methods and complex type inference for lambdas (type inference for generic method calls, however, is already working). And the worst part is that we'll need to rewrite the type attribution from the compiler in order to accommodate the powerful inference engine required for these enhancements.

So enter Lyra (pronounced as lie-rah). Lyra is a fully developed and independent programming language derived from Freya. Its syntax is almost identical to Freya's, but we have introduced some simplifications, in order to make it easier to understand and implement. These are some of the differences:

Implementation in Lyra takes place along with declaration. No more separated implementation for sections, for including executable content.
Property and event implementations have changed accordingly. The new implementation syntax is very compact and easy to understand, and uses four new special "keywords": get:, set:, add: and remove:. Semicolons are an integral part of the corresponding tokens, and they avoid both those dirty contextual keywords and disturbing valid identifiers in existing code.
Interface delegation is supported, but it has moved to the implementing member declaration.
Constructor calls now requires parentheses, except when followed by a list initializer.
In compensation, constructor calls can now start an object path.
Local variable declarations have been moved to the statement list, as declaration statements. This trick avoids some syntactic anomalies while defining properties and events.

Let's see an example showing how to write properties in Lyra:

property Items[Index: Integer]: Boolean;
get:
    // Compound assignments are not expressions.
    Index -= Min;
    // then is optional before a keyword.
    if Index < 0 or Index > High
        raise new ArgumentException("Index");
    // Symbolic and literal operators are both allowed.
    Result := (Bits[Index div 32] &
        (1 << (Index mod 32))) <> 0
set:
    Index -= Min;
    if Index < 0 or Index > High
        raise new ArgumentException("Index");
    if Value then
        Bits[Index div 32] |= 1 << (Index mod 32)
    else
        Bits[Index div 32] &= ~(1 << (Index mod 32))
end

Of course, we can still use most of the Freya tricks, such as expression-based methods and properties (and operators, and iterators), the abbreviated syntax for constructor chains, implicitly implemented properties (I think Freya/Lyra's trick is better than C^#'s). Assertions are fully supported, even when declared in an interface type. Numeric types are still implicitly extended with static members from the Math class. Lyra, in other words, is as powerful as Freya... but probably more elegant, too.

In the next months, all our research and implementation effort will concentrate on Lyra. These are some of our planned tasks:

Anonymous methods and lambdas are still missing. Probably, I'll implement only one of the syntactic varieties, but I still have some troubles with the syntax: C#'s lambdas, when implemented verbatim, don't make a valid LALR(1) grammar.
I still believe in the usefulness of access sentinels: wrapping code that would be inserted by the compiler around methods that access some class members. This is a safe form of Aspect Oriented Programming (and no, properties don't do the same trick!)
Lyra will feature mixins declared with interfaces. You'll be able to add common implementation methods to interfaces, as in this example:

IStack = interface[X]
    property Count: Integer; readonly;
    method Push(Value: X);
    method Pop: X;

    // These are the new "mixins":

    property IsEmpty: Boolean =>
        Count = 0;

    method Push(Values: IEnumerable[X]);
    begin
        for v in Values do Push(v)
    end

    method Clear;
    begin
        while Count > 0 do Pop
    end
end

It's some kind of funny that extension methods (that's what mixins are, at the end of the day), access sentinels and assertions are related by their implementations. For instance, pre and postconditions are implemented in Freya as extension methods. Class invariants' implementation, on the other hand, uses a similar approach as data sentinels to avoid firing in internal call boundaries.
I must acknowledge that most assertion implementations I know, beyond Eiffel, are shallow, incomplete or plainly wrong. Most of them don't handle inheritance and don't support assertion on interface types. Assertions in Freya addressed these problems, and Lyra will continue enhancing assertions support.
I have to complete the "Intellisense" compiler. This is a second (lighter) compiler that must take care of class navigation and code completion. I can't use the main compiler: it performs some code generation tasks very early in the compiling process.
Of course, Visual Studio integration is still an important item in the shopping list, just as CodeDOM support is.
Compiling must be moved to an isolated application domain, in order to avoid problems when unloading compiled assemblies.
Dynamic methods and "dynamic programming" support, you said? Huh, I hate the very concept! But you were kidding, weren't you?

Let me insist, anyway: Freya development hasn't been abandoned. As soon as Lyra moves to the next level, we'll update Freya in order to keep the pace.

Labels: Lyra, syntax

Mean and lean

Wouldn't be a good thing if you were allowed to define functions like these in C#?

public long fact(int n) =>
    n <= 1 ? 1 : n * fact(n - 1);

operator+(Complex c1, Complex c2) =>
    new Complex(c1.Re + c2.Re, c1.Im + c2.Im);

Actually, you are allowed even riskier tricks when dealing with inline lambdas. The uniformity principle states that, at least, some of these tricks should be accepted while defining regular functions. And that's what Freya does.

By the way, there are no omissions in the operator definition... by Freya standards, of course. Operators are always public and static, so why bother the user with stating this each and every time. On the other hand, the return type has been omitted: it can be statically deduced from the defining expression. Freya has precise rules for these static inferences: you can omit the return type when the expression is a constructor call, a typecast or, recursively, when it's a common expression block wrapping an expression with static inferable return type or a conditional with identical inferred types in both of its branches.

Labels: C#, functional programming, syntax

Wednesday, November 21, 2007

Jump!

A fair share of the algorithms implemented by the peephole optimizer has to do with jump optimization. The compiler has already been carefully crafted to produce the best possible code before reaching this pass, but some optimizations may reintroduce small inefficiencies, that must be solved one way or another.

This is a summary of the tasks performed by our optimizer:

Chained jumps detection

A chained jump is a branch instruction that jumps into an unconditional branch. In some pathological cases, the chain may extend through several jumps. When the optimizer detects such a chain, it substitutes all intermediate targets by the target at the end of the chain. We must be careful in order to avoid circular jumps.

Jumps to return

An unconditional branch may land on a return operation code. Execution time is saved by substituting the original branch with a return.

Common branch suffixes

As already explained here, this is a space-saving optimization that detects repeated code at the end of converging executing paths. Since time is more important than space, this pass is executed after jump to return detection.

Unreachable code detection

Code is symbolically executed, marking all instructions reached by some execution path. Then, all non-marked instructions are deleted.

Trivial jumps removal

As a side effect of the previous pass, code is compacted and some trivial jump patterns may be introduced. For instance, we could have a jump to the immediately next instruction in the code stream, or a conditional branch skipping an unconditional branch, which could be substituted by a single inverted conditional branch.

Peephole optimization, properly speaking

Now we scan the instruction stream searching for some short code sequences that can be simplified. For instance, we look for consecutive assignments on the same field, leaving just the last assignment. Consecutive assignments may be generated by automatic transformations, such as the algorithm for generating iterators. If a local variable is written and then read, and if this is the only use of that local variable, we can drop both the read and the write operations. This seemingly silly pattern is frequently generated by for/in loops.

Tail recursion

Then, we look for recursive calls before a return operation code. When some security conditions are met, we scan back the code that pushes the parameters for that call, in order to modify the arguments in the current stack frame, and then we substitute the method call by a jump to the beginning of the method. We never use the tail prefix from the IL instruction set, since it's not efficiently implemented.

Branch operand size tuning

Our last pass assigns offsets for each surviving operations, and computes distances for each remaining jumps. Then, branch codes are changed into short jumps whenever possible. Full size jumps requires five bytes, while a short jump is codified in just two bytes.

Labels: jumps, optimization

Tuesday, November 20, 2007

Stating the obvious

Should we always state the obvious? Interface types can only declare methods, properties and events (and assertions, since we're talking about Freya!). Fields, in any case, are not allowed inside an interface declaration.

Properties, however, are designed most of the times to resemble fields. Why should we stick to the property keyword when declaring properties for an interface type?

IScene = interface
   Sampler: ISampler; readonly;
   Camera: ICamera; readonly;
   Root: IShape; readonly;
   Lights: Array[ILight]; readonly;

   method Render: PixelMap;
   method Render(Row, Col: Integer): Pixel;
end

Of course, if you feel uneasy about dropping the keyword, you can still use it. Notice that there's no possible confusion, even with the declaration of Lights: that's a property returning an array, not an array property.

By the way, the above code fragment shows another omission: that of the final semicolon after the type declaration. Freya syntax has been relaxed to use optional semicolons in several places: after the end keyword in a complex type declaration and after the statement block in a method implementation. We're not going as far as Eiffel, where semicolons are optional almost everywhere, but our new syntax makes easier manual code translation from C# into Freya.

Labels: interface, property, style, syntax

Common branch suffixes

Another little enhancement has made its way into the Freya optimizer: common branch suffix detection. This technique detects repeated operation codes in code paths converging in a single point, and moves the branching instruction to take advantage of the common code. It's a trick for reducing code size without affecting its speed, at least, in a direct way. Of course, less code means faster JIT translation and better CPU cache performance.

Let's say we must generate code for this statement:

if condition then
   a.m(p₁, p₂, q)
else
   a.m(p₃, p₄, q)

The translation scheme would be as follows:

    EVAL(condition)
    br.false @@else
    EVAL(a)
    EVAL(p₁)
    EVAL(p₂)
    EVAL(q)
    callvirt m
    br @@continue
@@else:
    EVAL(a)
    EVAL(p₃)
    EVAL(p₄)
    EVAL(q)
    callvirt m
@@continue:

It's hard for the compiler to detect the common code in the Abstract Syntax Tree, so this is a task for our peephole optimizer. The optimizer kicks in after each unconditional jump, and scans backwards in parallel, from the branching point and the branch target. The above translation scheme is simplified this way:

    EVAL(condition)
    br.false @@else
    EVAL(a)
    EVAL(p₁)
    EVAL(p₂)
    br @@continue
@@else:
    EVAL(a)
    EVAL(p₃)
    EVAL(p₄)
@@continue:
    EVAL(q)
    callvirt m

In almost all situations, this change won't affect speed. There is, however, a case where we must be careful: suppose our conditional was the last statement in a method. In that case, the unconditional branch would jump into a return code, and the peephole optimizer would have substituted the jump with a direct return, saving the time consumed by the jump. To prevent this, common branch suffixes are always detected after the jump optimizer has finished its own duty.

About the impact of this optimization in real-world applications, please note that our example can't benefit from Freya's conditional expressions. What's more, the automatically generated code for iterators typically contains many assignments to the state field that can be factored out with our technique.

Labels: jumps, optimization

Saturday, November 17, 2007

Ranges

Range expressions may appear, in Freya, in three different contexts:

As case labels.
In for statements.
As the right operand of the in operator.

Of course, the original source was the case statement:

case expression of
    0, 3..20, 23: Console.WriteLine('Howdy');
    50..6000: Console.WriteLine('Silly');
end;

Please note that Freya, imitating C#, also supports case statements with string expressions, but in that case, ranges are not allowed, due to the underlying implementation, which depends on hash dictionaries. That's not a satisfactory situation. It could be acceptable for C#, where there're no ranges in its switch, but not for Freya. The solution, perhaps, could be the adoption of a new data structure for implementing ranges. But that's another story.

Membership

From the very first drafts, Freya featured an in operator, designed to work with collections. An expression like this:

x in A

is translated this way:

ICollection[T](A).Contains(x)

where T is the compiling time type of x. There's even a sort of "contains pattern", in the line of the "enumerable pattern", which allows any type with a suitable Contains method to be used as the right operand of the in operator.

We evaluated then several techniques for extending the in operator so it could act on ranges. The obvious technique was allowing ranges inside the brackets of array literals, imitating Pascal sets:

x in [1, 2, 5..10, 16]

After some tests, we discarded the idea. Implementing Pascalian bit sets was not an option: it would have required a core Freya assembly, and that's a scary idea. Another problem had to do with the places you could use an array literal with ranges inside. My intuition was crying aloud that the way was far from adding new data structures, like bitsets or extended enumerable interfaces.

Right now, I think ranges must exists as themselves in the language. The first hint came from the for statement. One day, I found myself writing loops like this:

for i in low .. high do
    Console.WriteLine(i);

Of course it is a cheap syntactic variant of the ascending numerical for, but there was another possible interpretation: the range expression was an iterator expression, i.e., a IEnumerable[T] expression. Why shouldn't we allow expressions like this?

if i in low..high and
   j not in low..high then ...

Each range check is translated as two comparisons. For instance:

// This expression...
x in e₁ .. e₂
// is translated as
x >= e₁ and x <= e₂
// and this one...
x + 1/x in e₁ .. e₂
// is translated as:
using temp := x + 1/x do
    temp >= e₁ and temp <= e₂

As a side effect, we now must allow in with an IEnumerable[T] right operand... expanding it as an inline search loop. But this variant is not implemented yet.

The existence test

There's an interesting and natural variant of the membership test:

if * in collection then
    // The collection is not empty.

You can also use the negated variant:

if * not in collection then
    // The collection is not empty.

In two of the possible cases, the implementation is trivial. For instance, if the existence test is applied to a range, you have a simple comparison involving the boundaries:

// This expression...
* in e₁ .. e₂
// is translated as
e₁ <= e₂

When the right side implements the IList[X] generic interface, the test is based on the Count property:

// This expression...
* in list
// is translated as
list.Count > 0

However, there's a case where the existence test is really useful. The right operand may only implement the so called enumeration pattern, as in the foreach statement, or one of the IEnumerable interfaces. Let's say the right operand implements IEnumerable[X]:

// This expression...
* in enumerable
// is translated as this expression:
using e := enumerable.GetEnumerator do
    e.MoveNext

Please note, first, that we have here a using expression, not a statement. Second: we need a using expression because the value returned by GetEnumerator may implement IDisposable. In that case, we need a resource protection block to dispose the enumerator after the test. Actually, our new existence test is saving you from writing a for/in statement:

var result := false;
for v in enumerable
begin
    result := true;
    break
end;

Labels: iterator, ranges

Wednesday, August 22, 2007

Elisions

Less typing means faster programming, and simple syntactic changes can save a lot of keystrokes. The newest Freya syntax allows you to drop the do and then keywords when they precede a statement starting with another keyword.

In this fragment, both the for and the while statements have dropped their do's:

method GetPrimes(Max: Integer): BitsArray;
begin
    Result := new BitsArray(2, Max);
    for i in 2..Max div 2 : not Result[i]
    begin
        var j := i + i;
        while j <= Max
        begin
            Result[j] := true;
            j += i;
        end;
    end;
    Result.Invert;
end;

Note that we have also dropped the for var combination that used to mark local variable type inference in previous versions.

There is another interesting detail in the above code:

    for i in 2..Max div 2 : not Result[i]

First, there's a seemingly innocent variation of the classical for/to numerical loop: we have disguised it as if we were iterating over a numerical range. We still cannot eliminate the classical statement, since this "iterator" does not substitute the downto loop.

However, the real purpose of this change has to do with the Boolean expression after the semicolon. It acts as a filter, and it can be used both with virtual numerical ranges and with real iterators.

At the end, an iterating filter translate as a nested if statement... but you have saved some typing, and your code is a little more expressive, since you don't have to look for a matching else before understanding the purpose of that nested if.

Labels: style, syntax

Sunday, July 08, 2007

Be constructive

We have already seen how useful can be expression based method implementations. Is there anything similar for constructors? Constructors have no return value, so they cannot directly benefit from expression based implementations... but it turns out there's a surprisingly similar feature at our reach.

In a sense, a constructor can be viewed as a static function (a method returning some value). The returned value is, of course, a new instance from the class the constructor belongs. In any case, there are a couple of constraints for constructors that regular methods have not: the return type must always be the same, inside a given class, and the returned instance must always be a new instance.

Note: It could make sense to remove the latter restriction. It's an important part of the contract for languages with explicit memory deallocation. Now think how you must implement an instance cache for a given class: since you cannot overload the new operator as in classic C++, you must search all creation expressions along the whole application to substitute them with method calls.

Given the previous constraints, and the fact that constructor calls are functional calls, we can find the equivalent to expression-based implementations: chained calls to sibling or parent constructors... a feature that was there all the time:

Stack = class[X]
private
    Items: Array[X];
    Count: Integer;
public
    constructor(Capacity: Integer);
    begin
        Items := new Array[X](Capacity);
    end;

    constructor : Self(128);

    // ...
end;

It's true it looks so different from regular expression-based implementations, but it's a historical consequence: chaining has been available in C++ from long time ago. The only novelty is that you can omit the whole method block from that point on.

It's recommendable to write always this kind of "bodiless" chaining in the class declaration, instead of using an implementation section. Neither C# nor Freya supports default parameter values, so it's very frequent to find a handful of constructors in a given type, each of them removing some parameters from the signature's tail. We can use chaining to implement all of them, except the first:

Metal = sealed class(IMaterial)
public
    constructor(Color: Pixel;
        MinRefl, MaxRefl, Diffuse: Double;
        PhongAmount, PhongSize: Double);

    constructor(Color: Pixel;
        MinRefl, MaxRefl, Diffuse: Double) :
        Self(Color, MinRefl, MaxRefl, Diffuse, 0, 1);
    constructor(Color: Pixel;
        MinRefl, MaxRefl: Double) :
        Self(Color, MinRefl, MaxRefl, 1, 0, 1);
    constructor(Color: Pixel; MinRefl: Double) :
        Self(Color, MinRefl, 1, 1, 0, 1);
    constructor(Color: Pixel) :
        Self(Color, 0.1, 1, 1, 0, 1);
end;

There's a second related question: is there any similar feature for iterators? I have an affirmative answer, but the syntactic details still need some boiling...

Though it's not an official Freya feature, it could also make sense to bring back another old C++ feature: field initialization in constructors. We could have written the previously shown Stack constructors like this:

// This is not Freya... yet...
constructor(Capacity: Integer)
    : Items(new Array[X](Capacity));
constructor
    : Self(128);

Of course, this enhances the expressivity of chaining, but that's not the main point. The main reason to bring this feature back has to do with programming by contract: this kind of field initialization may be necessary for implementing not nullable reference types, as in Spec#.

Labels: chaining, constructor, creation, style

Saturday, July 07, 2007

Notepad Oriented Programming

One of the shameless goals of Freya is to become a Notepad Oriented Programming Language: you must be able to write Freya applications with the Notepad and little more (Reflector, perhaps?). But that's a pretty hard goal when you're designing a language inspired by the Algol/Pascal lineage. It's not only that you must use those begin/end blocks instead of curly braces. If you must keep method declarations apart from their implementations, as in Delphi, then you'll have to type a lot... or lean on an editor that replicates the missing implementations by user request.

I think we have achieved the above stated goal. Right now, you can write Freya code that looks as compact and easy as C# code, and in some circumstances, the Freya variant may be even shorter than the C# equivalent. To illustrate this, let's take a look at C# and Freya operators. Let's say we're writing a Complex class in C#. Here you have two user defined operators on that class:

// C#
public static Complex operator+(Complex c1, Complex c2)
{
    return new Complex(c1.Re + c2.Re, c1.Im + c2.Im);
}

public static Complex operator-(Complex c1, Complex c2)
{
    return new Complex(c1.Re - c2.Re, c1.Im - c2.Im);
}

In a first attempt, those two operators would be translated to Freya this way:

public
    static operator+(c1, c2: Complex): Complex;
    begin
        Result := new Complex(
            c1.Re + c2.Re, c1.Im + c2.Im);
    end;

    static operator-(c1, c2: Complex): Complex;
    begin
        Result := new Complex(
            c1.Re - c2.Re, c1.Im - c2.Im);
    end;

There's nothing to be especially proud of in the above example: our fragment is longer in Freya than in C#. It's true that we have spared ourselves from some Delphi.NET eccentricities. For instance, we have written inline implementations, avoiding those nasty duplications imposed by the interface/implementation artificial split. We have also saved something in constructor calls: we use new as in C#, instead of calling some named constructor, as Delphi requires. It has nothing to do with saving two or three characters in each call (we're loosing that tiny advantage by using static instead of class, as in Delphi), but our syntax makes easier to translate existing code from C# to Freya. Last, but no least, we can use symbolic names for the operators, instead of Add and Subtract, as in Delphi or Chrome.

Expression based implementations will let us simplify the above listing:

public
    operator+(c1, c2: Complex): Complex =>
        new Complex(c1.Re + c2.Re, c1.Im + c2.Im);

    operator-(c1, c2: Complex): Complex =>
        new Complex(c1.Re - c2.Re, c1.Im - c2.Im);

We have deleted both begin/end blocks, and both assignations to Result. Since operators are always public and static in .NET, we have also dropped the static modifier. We now have code comparable to C# in length, and maybe even shorter. But we can keep shortening our example:

public
    operator+(c1, c2: Complex) => new Complex(
        c1.Re + c2.Re, c1.Im + c2.Im);

    operator-(c1, c2: Complex) => new Complex(
        c1.Re - c2.Re, c1.Im - c2.Im);

We are showing the last addition to Freya: return type inference for expression-based implementations. This is not a full featured type inference, as in functional languages or a modern language as Nemerle. The Freya compiler only allows the omission of the return type when it finds an expression based implementation, and when the implementing expression is an instantiation expression. We think that complex inferences are not a good thing, at least with a language as Freya, so we have added some inference... up to a sensible point.

We can use return type inference in yet another case, as this example shows:

operator/(c1, c2: Complex) =>
    using r2 := c2.Re.Sqr + c2.Im.Sqr do
        new Complex(
            (+c1.Re * c2.Re + c1.Im * c2.Im) / r2,
            (-c1.Re * c2.Im - c1.Im * c2.Re) / r2);

In this case, we have a common expression block containing a new expression, so we can safely deduce the return type before resolving the whole expression.

Of course, you won't be forced to write code in this style: if you want your code to look like good old Pascal, you still can write it that way... and I'm not being sarcastic. There's an important problem with compact code: how you get there. When you write programs by assembling small pieces into bigger ones, the result will contain extra code and glue that you probably won't need. The most frequent reason has to do with the fact that modular code, as you store it in your mental pattern library, must deal with a yet unknown context, so it probably has extra checks for handling extreme cases and such. When you adapt those code pieces for a given task, some special cases render improbable, and the corresponding guarding code can be deleted.

A second source of redundant coding is gluing. How do you compute the square root of the Zipperstein-Marmaduke Formula? First, you must evaluate the ZMF and then you'll have to find that pesky square root. In your first attempt, it's highly probable that the ZMF return value was stored in a local variable. There are two possibilities about the final code: either you can keep a separate line for dealing with ZMF, just to keeping what your code does, or you can merge both computations in a single expression. It's up to you to decide.

The consequence: compact code may be easier to read than to write... as the failure of functional programming languages to gain enough users has shown. Freya doesn't require you to write the shortest possible code, but that's still an option you have once you master the language.

... by the way, now we can write the Ray class as follows:

Ray = sealed class
public
    Origin, Direction: Vector;

    property Items[Time: Double] => new Vector(
        Origin.X + Time * Direction.X,
        Origin.Y + Time * Direction.Y,
        Origin.Z + Time * Direction.Z);
end;

Labels: functional programming, style