June 2010

Volume 25 Number 06

CLR Inside Out - F# Fundamentals

By Luke Hoban | June 2010

F# is a new, functional and object-oriented programming language for the Microsoft .NET Framework, and it’s integrated into this year’s release of Microsoft Visual Studio 2010. F# combines simple, succinct syntax with strong static typing, and it scales from lightweight explorative programming in the F# Interactive up to large-scale .NET Framework-based component development with Visual Studio.

F# is designed from the ground up to run on the CLR. As a .NET Framework-based language, F# leverages the rich libraries available on the .NET Framework platform, and can be used to build .NET libraries or implement .NET interfaces. F# also takes advantage of many of the CLR core features, including generics, garbage collection, tail call instructions and the fundamental Common Language Infrastructure (CLI) type system.

This article takes a look at some of the core concepts of the F# language and its implementation on top of the CLR.

A Quick Look at F#

Let’s start with a brief look at a number of the core language features in F#. For more details on any of these features and the many other interesting concepts in the F# language, see the documentation available via the F# Developer Center at fsharp.net.

The most fundamental feature of F# is the let keyword, which binds a value to a name. Let can be used to bind both data and function values, and for both top-level and local bindings:

let data = 12
 
let f x = 
    let sum = x + 1 
    let g y = sum + y*y 
    g x

F# provides a few core datatypes and a language syntax for working with structured data, including lists, typed optional values and tuples:

let list1 = ["Bob"; "Jom"]

let option1 = Some("Bob")
let option2 = None

let tuple1 = (1, "one", '1')

These pieces of structured data, and others, can be matched against by using F# pattern matching expressions. Pattern matching is similar to using switch statements in C-like languages, but provides a richer way to both match and extract parts out of matched expressions, somewhat akin to the way regular expressions are used for pattern-matching strings:

let person = Some ("Bob", 32)

match person with
| Some(name,age) -> printfn "We got %s, age %d" name age
| None -> printfn "Nope, got nobody"

F# leverages the .NET Framework libraries for many tasks, such as accessing data from a rich variety of data sources. .NET libraries can be used from F# in the same way they are used in other .NET languages:

let http url = 
    let req = WebRequest.Create(new Uri(url))
    let resp = req.GetResponse()
    let stream = resp.GetResponseStream()
    let reader = new StreamReader(stream)
    reader.ReadToEnd()

F# is also an object-oriented language and can define any .NET class or struct, similar to C# or Visual Basic:

type Point2D(x,y) = 
    member this.X = x
    member this.Y = y
    member this.Magnitude = 
        x*x + y*y
    member this.Translate(dx, dy) = 
        new Point2D(x + dx, y + dy)

In addition, F# supports two special kinds of types: records and discriminated unions. Records provide a simple representation of data values with named fields, and discriminated unions are an expressive way to represent types that can have a number of different kinds of values, with different associated data in each kind:

type Person = 
    { Name : string;
      HomeTown : string;
      BirthDate : System.DateTime }

type Tree = 
    | Branch of Tree * Tree
    | Leaf of int

F# on the CLR

F# is in many ways a higher-level language than C#, with its type system, syntax and language constructs being further away from the metadata and intermediate language (IL) of the CLR. This has a few interesting implications. Most importantly, it means F# developers can often solve problems and think about their programs at a higher level, closer to the domain of the problem at hand. But it also means the F# compiler does more work in mapping F# code onto the CLR, and that the mapping is less direct.

The C# 1.0 compiler and the CLR were developed at the same time, and the features of both were closely aligned. Almost all C# 1.0 language constructs have a very direct representation in the CLR type system and in CIL. This has become less true in later C# releases as the C# language evolved faster than the CLR itself. Iterators and anonymous methods were fundamental C# 2.0 language features that didn’t have direct CLR equivalents. In C# 3.0, query expressions and anonymous types followed this trend.

F# takes this a step further. Many of the language constructs don’t have direct IL equivalents, so features like pattern matching expressions get compiled into a rich set of IL instructions used to accomplish the pattern matching efficiently. F# types such as records and unions automatically generate many of the members needed.

Note, however, that I’m discussing the compilation techniques used by the current F# compiler. Many of these implementation details are not directly visible to the F# developer and could be modified in future versions of the F# compiler for performance optimizations or to enable new features.

Immutable By Default

The basic let binding in F# is similar to var in C#, except for one very important difference: you can’t change the value of a let-bound name later. That is, values are immutable by default in F#:

let x = 5
x <- 6 // error: This value is not mutable

Immutability has big benefits for concurrency because there is no need to worry about locking when using immutable state—it can be safely accessed from multiple threads. Immutability also tends to decrease coupling between components. The only way for one component to influence another is to make an explicit call to the components.

Mutability can be opted into in F#, and is often used when calling other .NET libraries, or to optimize particular code paths:

let mutable y = 5
y <- 6

Similarly, types in F# are immutable by default:

let bob = { Name = "Bob"; 
            HomeTown = "Seattle" }
// error: This field is not mutable
bob.HomeTown <- "New York" 

let bobJr = { bob with HomeTown = "New York" }

In this example, when mutation is not available, it’s common to instead use copy-and-update to make a new copy from an old one while changing one or more fields. Although a new object is created, it shares many pieces with the original. In this example, only a single string—“Bob”—is needed. This sharing is an important part of the performance of immutability.

Sharing can also be seen in F# collections. For example, the F# list type is a linked-list data structure that can share a tail with other lists:

let list1 = [1;2;3]
let list2 = 0 :: list1
let list3 = List.tail list1

Because of the copy-and-update and sharing inherent in programming with immutable objects, the performance profile of these programs is often quite different from typical imperative programs.

The CLR plays a big role here. Immutable programming tends to create more short-lived objects as a result of transforming data rather than changing it in place. The CLR garbage collector (GC)deals well with these. Short-lived small objects are relatively very cheap due to the generational mark-and-sweep used by the CLR GC.

Functions

F# is a functional language and, not surprisingly, functions play an important role throughout the language. Functions are a first-class part of the F# type system. For example, the type “char -> int” represents F# functions that take a char and return an int.

Although similar to .NET delegates, F# functions have two important differences. First, they’re not nominal. Any function that takes a char and returns an int is of type “char -> int”, whereas multiple differently named delegates may be used to represent functions of this signature, and are not interchangeable.

Second, F# functions are designed to efficiently support either partial or full application. Partial application is when a function with multiple parameters is given only a subset of the parameters, thus resulting in a new function that takes the remaining parameters.

let add x y = x + y

let add3a = add 3
let add3b y = add 3 y
let add3c = fun y -> add 3 y

All first-class F# function values are instances of a type FSharpFunc<, > as defined in the F# runtime library, FSharp.Core.dll. When using an F# library from C#, this is the type that all F# function values taken as parameters or returned from methods will have. This class looks roughly like the following (if you were defining it in C#):

public abstract class FSharpFunc<T, TResult> {
    public abstract TResult Invoke(T arg);
}

Note in particular that all F# functions fundamentally take a single argument and produce a single result. This captures the concept of partial application—an F# function with multiple parameters will actually be an instance of a type like:

FSharpFunc<int, FSharpFunc<char, bool>>

That is, a function that takes an int and returns another function, which itself takes a char and returns a bool. The common case of full application is made fast by using a set of helper types in the F# core library.

When an F# function value is created using a lambda expression (the fun keyword), or as a result of a partial application of another function (as in the add3a case shown earlier), the F# compiler generates a closure class:

internal class Add3Closure : FSharpFunc<int, int> {
    public override int Invoke(int arg) {
        return arg + 3;
    }
}

These closures are similar to closures created by the C# and Visual Basic compilers for their lambda expression constructs. Closures are one of the most common compiler-generated constructs on the .NET Framework platform that do not have direct CLR-level support. They exist in almost all .NET programming languages and are used especially heavily by F#.

Function objects are common in F#, so the F# compiler uses many optimization techniques to avoid the need to allocate these closures. Using inlining, lambda-lifting, and direct representation as .NET methods when possible, the internal code generated by the F# compiler will often look somewhat different than described here.

Type Inference and Generics

One notable feature of all the code examples so far is the lack of any type annotation. Although F# is a statically typed programming language, explicit type annotations are often not needed because F# makes extensive use of type inference.

Type inference will be familiar to C# and Visual Basic developers who use it for local variables, as in this C# 3.0 code:

var name = "John";

The let keyword in F# is similar, but type inference in F# goes substantially further, applying also to fields, parameters and return types. In the following example, the two fields x and y are inferred to have type int, which is the default for the + and * operators used on these values within the body of the type definition. The Translate method is inferred to have type “Translate : int * int -> Point2D”:

type Point2D(x,y) = 
    member this.X = x
    member this.Y = y
    member this.Magnitude = 
        x*x + y*y
    member this.Translate(dx, dy) = 
        new Point2D(x + dx, y + dy)

Of course, type annotations can be used when needed or desired to tell the F# compiler what type is really expected for a certain value, field or parameter. This information will then be used for type inference. For example, you can change the definition of Point2D to use float instead of int by adding just a couple of type annotations:

type Point2D(x : float,y : float) = 
    member this.X = x
    member this.Y = y
    member this.Magnitude = 
        x*x + y*y
    member this.Translate(dx, dy) = 
        new Point2D(x + dx, y + dy)

One of the important results of type inference is that functions not tied to a specific type are automatically generalized to be generic functions. So your code will become as generic as possible without you needing to explicitly specify all the generic types. This causes generics to play a fundamental role in F#. The compositional style of functional programming with F# also encourages small reusable pieces of functionality, which benefit greatly from being as generic as possible. The ability to author generic functions without the complex type annotations is an important feature of F#.

For example, the following map function walks a list of values and generates a new list by applying its argument function f to each element:

let rec map f values = 
    match values with
    | [] -> []
    | x :: rest -> (f x) :: (map f rest)

Note that there are no type annotations needed, but the type inferred for map is “map : (‘a -> ‘b) -> list<’a>  -> list<’b>”. F# is able to infer from the use of pattern matching, and from the use of the parameter f as a function, that the types of the two parameters have a certain shape, but are not completely fixed. So F# makes the function as generic as possible while still having the types needed by the implementation. Note that generic parameters in F# are indicated using a leading ‘ character, to distinguish them syntactically from other names.

Don Syme, the designer of F#, was previously the lead researcher and developer on the implementation of generics in the .NET Framework 2.0. The concept of a language like F# critically depends on having generics in the runtime, and Syme’s interest in doing F# came in part from wanting to really take advantage of this CLR feature. F# leverages .NET generics heavily; for example, the implementation of the F# compiler itself has more than 9,000 generic type parameters.

Ultimately, type inference is just a compile-time feature, though, and every piece of F# code gets an inferred type that’s encoded in the CLR metadata for an F# assembly.

Tail Calls

Immutability and functional programming tend to encourage the use of recursion as a computational tool in F#. For example, an F# list can be walked and the sum of the squares of the values in the list collected using a simple piece of recursive F# code:

let rec sumOfSquares nums =
    match nums with
    | [] -> 0
    | n :: rest -> (n*n) + sumOfSquares rest

While recursion is often convenient, it can use a lot of space on the call stack because each iteration adds a new stack frame. For sufficiently large inputs this can even lead to stack-overflow exceptions. To avoid this stack growth, recursive code can be written tail-recursively, meaning that recursive calls are always the last thing done, just before the function returns:

let rec sumOfSquaresAcc nums acc = 
    match nums with 
    | [] -> acc
    | n :: rest -> sumOfSquaresAcc rest (acc + n*n)

The F# compiler implements tail-recursive functions using two techniques that aim to ensure the stack will not grow. For direct tail calls to the same function being defined, such as the call to sumOfSquaresAcc, the F# compiler automatically converts the recursive call into a while loop, thus avoiding making any call at all, and generating code very similar to an imperative implementation of the same function.

Tail recursion is not always as simple as this, though, and can instead be a result of multiple mutually recursive functions. In this case, the F# compiler relies on the CLR native support for tail calls.

The CLR has an IL instruction specifically to help with tail recursion: the tail. IL prefix. The tail. instruction tells the CLR it can discard the caller’s method state prior to making the associated call. This means that the stack will not grow when taking this call. It also means, at least in principle, that it may be possible for the JIT to make the call efficiently using just a jump instruction. This is useful for F#, and ensures that tail recursion is safe in 
almost all cases:

IL_0009:  tail.
IL_000b:  call    bool Program/SixThirtyEight::odd(int32)
IL_0010:  ret

In CLR 4.0, a few key improvements have been made to the treatment of tail calls. The x64 JIT had previously implemented tail calls very efficiently, but using a technique that could not be applied to all cases where the tail. instruction appeared. This meant some F# code that ran successfully on x86 platforms would fail with a stack overflow on x64 platforms. In CLR 4.0, the x64 JIT extends its efficient implementation of tail calls to more cases, and also implements the higher-overhead mechanism needed to ensure that tail calls are taken anytime they would be on the x86 JIT.

A detailed account of the CLR 4.0 improvements for tail calls is available on the CLR Code Generation blog (blogs.msdn.com/clrcodegeneration/archive/2009/05/11/tail-call-improvements-in-net-framework-4.aspx).

F# Interactive

F# Interactive is a command-line tool and Visual Studio tool window for interactively executing F# code (see Figure 1). This tool makes it easy to experiment with data, explore APIs and test application logic using F#. F# Interactive is made possible by the CLR Reflection.Emit API. This API allows a program to generate new types and members at run time and call into this new code dynamically. F# Interactive uses the F# compiler to compile code the user inputs at the prompt, then uses Reflection.Emit to generate the types, functions and members instead of writing an assembly to disk.

Executing Code in F# Interactive
Figure 1 Executing Code in F# Interactive

One key result of this approach is that the user code being executed is fully compiled and fully JITed, including all the useful optimizations in both of these steps, instead of being an interpreted version of F#. That makes the F# Interactive an excellent, high-performance environment for trying out new problem-solving approaches and interactively exploring large datasets.

Tuples

Tuples in F# provide a simple way to package data and pass it around as a unit, without needing to define new custom types or use complicated parameter schemes such as out parameters to return multiple values.

let printPersonData (name, age) = 
    printfn "%s is %d years old" name age

let bob = ("Bob", 34)

printPersonData bob

    
let divMod n m = 
    n / m, n % m

let d,m = divMod 10 3

Tuples are simple types, but have a few important properties in F#. Most significantly, they’re immutable. Once constructed, the elements of a tuple cannot be modified. This allows tuples to be safely treated as just a combination of their elements. It also enables another important feature of tuples: structural equality. Tuples and other F# types such as  lists, options, and user-defined records and unions are compared for equality by comparing their elements.

In the .NET Framework 4, tuples are now a core datatype defined in the base class libraries. When targeting the .NET Framework 4, F# uses the System.Tuple type to represent these values. Having support for this core type in mscorlib means F# users can easily share tuples with C# APIs and vice versa.

Although tuples are conceptually simple types, there are many interesting design decisions involved in building the System.Tuple type. Matt Ellis covered the design process for Tuple in detail in a recent CLR Inside Out column (msdn.microsoft.com/magazine/dd942829).

Optimizations

Because F# translates less directly to the CLR instructions, there’s more room for optimization to be done in the F# compiler instead of just relying on the CLR JIT compiler. The F# compiler takes advantage of this and implements more significant optimizations in Release mode than the C# and Visual Basic compilers.

One simple example is intermediate tuple elimination. Tuples are frequently used to structure data while it’s being processed. It’s common for tuples to be created and then deconstructed within a single function body. When this happens, there’s an unnecessary allocation of a tuple object. Because the F# compiler knows that creating and deconstructing a tuple can’t have any important side effects, it will attempt to avoid allocating the intermediate tuple.

In this example, no tuple object needs to be allocated, as it is used only by being deconstructed in the pattern match expression:

let getValueIfBothAreSame x y = 
    match (x,y) with
    | (Some a, Some b) when a = b -> Some a
    |_ -> None

Units of Measure

Units of measure, like meters and seconds, are commonly used in science, engineering and simulation, and are fundamentally a type system for working with numerical quantities of different kinds. In F#, units of measure are brought into the language’s type system directly so that numerical quantities can be annotated with their units. These units are carried through computations, and errors are reported when units do not match. In the following example, it’s an error to try to add kilograms and seconds, though note that it’s not an error to divide kilograms by seconds.

[<Measure>] type kg
/// Seconds
[<Measure>] type s
    
let x = 3.0<kg>
//val x : float<kg>

let y = 2.5<s>
// val y : float<s>

let z = x / y
//val z : float<kg/s>

let w = x + y
// Error: "The unit of measure 's' 
// does not match the unit of measure 'kg'"

Units of measure become a fairly lightweight addition thanks to F# type inference. Using type inference, user-provided unit annotations need to appear only on literals and when accepting data from outside sources. Type inference then propagates these through the program, and checks that all computations are being done correctly according to the units being used.

Although part of the F# type system, units of measure are erased at compilation time. This means the resulting .NET assembly does not include the information about units, and the CLR just treats unitized values as their underlying type—thereby incurring no performance overhead. This is in contrast to .NET generics, which are fully available at run time.

If, in the future, the CLR were to integrate units of measure into the core CLR type system, F# would be able to expose the unit information so it could be seen from other.NET programming languages.

Get Interactive with F#

As you’ve seen, F# provides an expressive, functional, object-oriented and explorative programming language for the .NET Framework. It’s integrated into Visual Studio 2010—including the F# Interactive tools for jumping straight in and experimenting with the language.

The language and tools leverage the full breadth of the CLR and introduce some higher-level concepts that are mapped onto the metadata and IL of the CLR. Yet F# is ultimately just another .NET language and can be easily incorporated as a component of new or existing .NET projects, thanks to the common type system and runtime.    


Luke Hoban is the program manager for the F# team at Microsoft. Before moving to the F# team, he was the program manager for the C# compiler and worked on C# 3.0 and LINQ.