Internals

04 Mar 2015 por asterite

Let’s talk about what Crystal does with your code: how it represents types in memory, how it does method lookup at runtime, how it does method dispatch, etc. When using a programming language it’s always useful to know this in order to structure our code in the most efficient way, and to precisely understand what our code will be transformed to.

How types are represented in memory

For talking about type representations we will use C and LLVM IR code, so be sure to check the reference.

Crystal has built-in types, user defined types and unions.

Built-in types

These are Nil, Bool, Char and the various number types (Int32, Float64, etc.), Symbol, Pointer, Tuple, StaticArray, Enum, Proc and Class.

Let’s check how a Bool is represented. For this, let’s write this small program:

# test.cr
x = true

To see the generated LLVM we can use this command:

crystal build test.cr --emit llvm-ir --prelude=empty

The --emit llvm-ir flag tells the compiler to dump the resulting LLVM IR code to a test.ll file. The --prelude=empty tells the compiler to not use the default prelude file, which, for example, initializes the GC.

In this way we can get a very simple and clean LLVM IR code file with just the code we write:

; ModuleID = 'main_module'

%String = type { i32, i32, i32, i8 }

@symbol_table = global [0 x %String*] zeroinitializer

define i1 @__crystal_main(i32 %argc, i8** %argv) {
alloca:
  %x = alloca i1
  br label %entry

entry:                                            ; preds = %alloca
  store i1 true, i1* %x
  ret i1 true
}

declare i32 @printf(i8*, ...)

define i32 @main(i32 %argc, i8** %argv) {
entry:
  %0 = call i1 @__crystal_main(i32 %argc, i8** %argv)
  ret i32 0
}

The gist is in __crystal_main: we can see the compiler allocates an i1 in the stack for x and then stores true in it. That is, the compiler represents a Bool as a single bit, which is pretty efficient.

Let’s do the same for an int:

x = 1

For x this time we will get:

%x = alloca i32

In LLVM, and i32 is an int represented with 32 bits, which, again, is pretty efficient and what we would expect the representation of Int32 to be.

That is, even though Crystal is object-oriented and an Int32 behaves like an object (it has methods), its internal representation is as efficient as possible.

Symbol

Let’s see Symbol now:

x = :one
y = :two

Let’s see the full LLVM IR code this time:

; ModuleID = 'main_module'
target datalayout = "e-m:o-i64:64-f80:128-n8:16:32:64-S128"
target triple = "x86_64-apple-darwin14.1.0"

%String = type { i32, i32, i32, i8 }

@one = private constant { i32, i32, i32, [4 x i8] } { i32 1, i32 3, i32 3, [4 x i8] c"one\00" }
@two = private constant { i32, i32, i32, [4 x i8] } { i32 1, i32 3, i32 3, [4 x i8] c"two\00" }
@symbol_table = global [2 x %String*] [%String* bitcast ({ i32, i32, i32, [4 x i8] }* @one to %String*), %String* bitcast ({ i32, i32, i32, [4 x i8] }* @two to %String*)]

define internal i32 @__crystal_main(i32 %argc, i8** %argv) {
alloca:
  %x = alloca i32
  %y = alloca i32
  br label %entry

entry:                                            ; preds = %alloca
  store i32 0, i32* %x
  store i32 1, i32* %y
  ret i32 1
}

declare i32 @printf(i8*, ...)

define i32 @main(i32 %argc, i8** %argv) {
entry:
  %0 = call i32 @__crystal_main(i32 %argc, i8** %argv)
  ret i32 0
}

Three things are important here. First, we can see that a Symbol is represented as i32, that is, with four bytes. Second, we can see x is assigned a value of 0 and y is assigned a value of 1. Third, we can see some constants at the top: hello, bye and symbol_table.

Basically, a Symbol in Crystal is just a name assigned to a unique number. A Symbol can’t be created dynamically (there’s no String#to_sym) and the only way to create them is with their literal value, so the compiler can know all the symbols used across the program. The compiler assigns a number to each of them, starting from zero, and it also builds a table that map their number to a string, to be able to implement Symbol#to_s in a very efficient way. This makes symbols very attractive to use for small groups of constants, because it’s like using magic numbers but with names instead.

Pointer

A Pointer is a generic type that represents a typed pointer to some memory location. For example:

x = Pointer(Int32).malloc(1_u64)
x.value = 1
x.value #=> 1

If you look at the generated LLVM IR code you will see a bunch of code. First, x is represented like this:

%x = alloca i32*

Again, this is just a pointer to an int32, as it should be. Next you will see a call to malloc (will ask memory from the GC using the regular prelude) and memset to clear the memory, and then some instructions to assign 1 in that memory address. This is not very important for the subject of this blog post, so we skip it, but it’s important to know that generated code is very similar to what would be generated in C.

Tuple

A Tuple is a fixed-size, immutable sequence of values, where the types at each position are known at compile time.

x = {1, true}

Pieces of LLVM IR code for the above are:

%"{Int32, Bool}" = type { i32, i1 }
...
%x = alloca %"{Int32, Bool}"

As we can see, a tuple is represented as an LLVM structure, which just packs values sequentially. This representation of tuples allows us, for example, to decompose an Int32 into its bytes, in this way:

x = 1234
ptr = pointerof(x) as {UInt8, UInt8, UInt8, UInt8}*
puts ptr.value #=> {21, 205, 91, 7}

StaticArray

A StaticArray is a fixed-size, mutable sequence of values of a same type, allocated on the stack and passed by value. The prelude includes safe ways to create them, but since we are using a bare-bones prelude an unsafe (will be initialized to data containing garbage) way to create them is this:

x = uninitialized Int32[8]

Its LLVM representation:

%x = alloca [8 x i32]

We won’t talk much more about this type because it’s not used that much, mostly for IO buffers and such: Array is the recommended type for all other operations.

Enum

Here’s an enum:

enum Color
  Red
  Green
  Blue
end

x = Color::Green

An enum is, in a way, similar to Symbol: numbers associated to names so we can use names in our code instead of magic numbers. As expected, an enum is represented as an i32, that is four bytes, unless specified otherwise in its declaration:

enum Color : UInt8
  Red
  Green
  Blue
end

The nice thing about enums is that you can print them and you get their name, not their value:

puts Color::Green #=> Green

This is done in a different way than with Symbol, using compile-time reflection and macros. But, basically, an enum’s to_s method is generated only when needed. But it’s nice that an enum is memory and speed efficient and also comfortable to use and to debug with (like, you get names instead of numbers when printing them).

Proc

A Proc is a function pointer with an optional closure data information. For example:

f = ->(x : Int32) { x + 1 }

This is a function pointer that receives an Int32 and returns an Int32. Since it doesn’t capture any local variables it’s not a closure. But the compiler still represents it like this:

%"->" = type { i8*, i8* }

That is a pair of pointers: one containing the pointer to the real function, another one containing a pointer to the closured data.

The LLVM IR code for the above is:

; ModuleID = 'main_module'

%String = type { i32, i32, i32, i8 }
%"->" = type { i8*, i8* }

@symbol_table = global [0 x %String*] zeroinitializer

define %"->" @__crystal_main(i32 %argc, i8** %argv) {
alloca:
  %f = alloca %"->"
  %0 = alloca %"->"
  br label %entry

entry:                                            ; preds = %alloca
  %1 = getelementptr inbounds %"->"* %0, i32 0, i32 0
  store i8* bitcast (i32 (i32)* @"~fun_literal_1" to i8*), i8** %1
  %2 = getelementptr inbounds %"->"* %0, i32 0, i32 1
  store i8* null, i8** %2
  %3 = load %"->"* %0
  store %"->" %3, %"->"* %f
  ret %"->" %3
}

declare i32 @printf(i8*, ...)

define i32 @main(i32 %argc, i8** %argv) {
entry:
  %0 = call %"->" @__crystal_main(i32 %argc, i8** %argv)
  ret i32 0
}

define internal i32 @"~fun_literal_1"(i32 %x) {
entry:
  %0 = add i32 %x, 1
  ret i32 %0
}

A bit harder to digest than the above examples, but it’s basically assining a pointer to ~fun_literal_1 in the first position and null in the second. If our Proc captures a local variable:

a = 1
f = ->(x : Int32) { x + a }

The LLVM IR code changes:

; ModuleID = 'main_module'

%String = type { i32, i32, i32, i8 }
%"->" = type { i8*, i8* }
%closure = type { i32 }

@symbol_table = global [0 x %String*] zeroinitializer

define %"->" @__crystal_main(i32 %argc, i8** %argv) {
alloca:
  %f = alloca %"->"
  %0 = alloca %"->"
  br label %entry

entry:                                            ; preds = %alloca
  %malloccall = tail call i8* @malloc(i32 ptrtoint (i32* getelementptr (i32* null, i32 1) to i32))
  %1 = bitcast i8* %malloccall to %closure*
  %a = getelementptr inbounds %closure* %1, i32 0, i32 0
  store i32 1, i32* %a
  %2 = bitcast %closure* %1 to i8*
  %3 = getelementptr inbounds %"->"* %0, i32 0, i32 0
  store i8* bitcast (i32 (i8*, i32)* @"~fun_literal_1" to i8*), i8** %3
  %4 = getelementptr inbounds %"->"* %0, i32 0, i32 1
  store i8* %2, i8** %4
  %5 = load %"->"* %0
  store %"->" %5, %"->"* %f
  ret %"->" %5
}

declare i32 @printf(i8*, ...)

declare noalias i8* @malloc(i32)

define i32 @main(i32 %argc, i8** %argv) {
entry:
  %0 = call %"->" @__crystal_main(i32 %argc, i8** %argv)
  ret i32 0
}

define internal i32 @"~fun_literal_1"(i8*, i32 %x) {
entry:
  %1 = bitcast i8* %0 to %closure*
  %a = getelementptr inbounds %closure* %1, i32 0, i32 0
  %2 = bitcast i8* %0 to %closure*
  %3 = load i32* %a
  %4 = add i32 %x, %3
  ret i32 %4
}

This is even harder to digest, but basically some memory is asked that will contain the value of the variable a, and the Proc recieves it and uses it. In this case the memory is asked with malloc, but with the regular prelude the memory will be allocated by the GC and released when no longer needed.

Class

Classes are objects too:

x = Int32

Not surprisingly, a class is represented as an Int32:

%x = alloca i32
...
store i32 45, i32* %x

Because classes can’t be created at runtime, and the compiler knows all classes, it assigns a type id to them and that way it can identify them.

User-defined types

Users can define classes and structs. The difference is that newing a class allocates it on the heap, and a pointer to that data is passed across variables and methods, while newing a struct allocates that memory on the stack and the whole struct’s value is passed, copied, across variables and methods.

Let’s try it:

class Point
  def initialize(@x, @y)
  end
end

x = Point.new(1, 2)

The LLVM IR code contains:

%Point = type { i32, i32, i32 }
...
%x = alloca %Point*

Mmm… wait! A Point has just two instance variables, @x and @y, both of type Int32, so why there’s another i32 there? Well, it turns out Crystal adds an Int32 to store a type id associated with the class. This doesn’t make much sense right now, but when we’ll talk about how unions are represented it will make more sense.

Let’s see the same for a struct:

struct Point
  def initialize(@x, @y)
  end
end

x = Point.new(1, 2)

The LLVM IR code contains:

%Point = type { i32, i32 }
...
%x = alloca %Point

In this case a struct doesn’t contain the extra Int32 field for the type id.

Now comes the fun part: unions!

Unions

Crystal supports unions of arbitrary types. For example you can have a variable that has either an Int32 or a Bool:

if 1 == 2
  x = 3
else
  x = false
end

At the end of the if the variable x will either be 3 or false, which makes it type an Int32 or a Bool. The Crystal way to talk about a union is using a pipe, like this: Int32 | Bool. In the LLVM IR code we can find:

%"(Int32 | Bool)" = type { i32, [1 x i64] }
...
%x = alloca %"(Int32 | Bool)"

We can see that the representation of this particular union is an LLVM structure containing two fields. The first one will contain the type id of the value. The second one is the value itself, which is a bit array as large as the largest type in that union (due to some alignment concerns, the size is extended to 64 bits boundaries in 64 bit architectures). In C it would be:

struct Int32OrBool {
  int type_id;
  union {
    int int_value;
    bool bool_value;
  };
}

The first field, the type id, will be used by the compiler when you invoke a method on x.

So, it would seem that here ends the story about how union types are represented. However, there are some unions that are very common: nilable types.

We didn’t talk about Nil previously, but since it can only contain a single value, and you can’t use void for a value, its represented as i1:

x = nil # %x = alloca i1

Let’s make now a union of nil and a class:

if 1 == 2
  x = nil
else
  x = Point.new(1, 2)
end

If we check the LLVM IR code we will see this for x:

%x = alloca %Point*

So a union of Point | Nil, where Point is a class, is represented in the same was as the Point class. How can we tell if x is Nil or Point? Easy: a null pointer means it’s Nil, a non-null pointer means it’s a Point.

In fact, all unions that only involve classes and/or nil are always represented as a single pointer. If it’s a null pointer, it’s Nil. Otherwise, if the union contains many possible classes, we can know the type with the first member of the value, an Int32, remember? Having all of these unions be represented as pointers makes the code much more efficient, as pointers fit in registers and occupy very little memory.

However, a union of Nil and a struct will always be represented as a tagged union, like the Int32 | Bool case. But these unions are much less common.

Now that we understand how types are represented and how, at runtime, we can know what type is contained in a union, let’s talk about method dispatch.

Method dispatch

Although Crystal is object-oriented, method lookup and dispatch work very different than other object-oriented languages. For example, there are no virtual tables and no metadata stored for types (except that type id field we talked before). We try to minimize the runtime data needed for a program to work, and also maximize its speed of execution, sometimes sacrificing the resulting binary size (which doesn’t grow a lot, either). For example, let’s consider this class hierarchy:

module Moo
  def foo
    1
  end
end

class Foo
  def foo
    2
  end
end

class Bar < Foo
  include Moo
end

class Baz < Bar
end

Baz.new.foo #=> 1

Wow, a big class hierarchy and even an included module, and two definitions for foo. By looking at the code, can you know which foo method will get invoked in this case?

…

Well, it’s Moo#foo, right? Yes, indeed. Well, it turns out the compiler knows this too, and if you take a look at the generated code you will see something like this:

; Create a Bar
%0 = call %Baz* @"*Baz::new:Baz"()
; Invoke Moo#foo: no method lookup
%1 = call i32 @"*Baz@Moo#foo<Baz>:Int32"(%Baz* %0)

...

define internal i32 @"*Baz@Moo#foo<Baz>:Int32"(%Baz* %self) {
entry:
  ret i32 1
}

What happens if we create an instance of Bar and we invoke foo on it too:

Bar.new.foo
Baz.new.foo

Now the LLVM IR code contains this:

%0 = call %Bar* @"*Bar::new:Bar"()
%1 = call i32 @"*Bar@Moo#foo<Bar>:Int32"(%Bar* %0)
%2 = call %Baz* @"*Baz::new:Baz"()
%3 = call i32 @"*Baz@Moo#foo<Baz>:Int32"(%Baz* %2)
...
define internal i32 @"*Bar@Moo#foo<Bar>:Int32"(%Bar* %self) {
entry:
  ret i32 1
}

define internal i32 @"*Baz@Moo#foo<Baz>:Int32"(%Baz* %self) {
entry:
  ret i32 1
}

Oops, isn’t there a duplicated definition of foo there? Well, yes. You can think as if the compiler copied foo’s definition into each class, and so there will be, indeed, many copies of the same method. But this doesn’t matter much: most methods are not big, and method call speed is much more important. Furthermore, small methods get inlined anyway in an optimized build, and there’s even an LLVM transformation pass to detect duplicated functions and merge them.

Of course, the story changes a bit if Moo#foo invokes an instance method or uses an instance variable. In this case the “duplicated” methods will actually be different, again, as if we copied each method definition into each type that finally contains it. This makes method call as efficient as possible, at the cost of (possibly) increasing executable size. But end users are usually more concerned about speed than executable size.

All of the above is possible because the compiler knows the exact type of Bar.new. What happens if the compiler doesn’t know this? Let’s start with a simple union where types are not classes in the same hierarchy:

class Foo
  def foo
    1
  end
end

class Bar
  def foo
    1
  end
end

if 1 == 2
  obj = Foo.new
else
  obj = Bar.new
end
obj.foo

This time the compiler will generate code that more or less does this: before invoking foo on obj, check what type is obj. This can be known by loading the first field (the type id) of the pointer that represents the object. Then based on this we invoke one method or another. The decision for this is just one memory load and a comparison: very efficient. For a bigger union it would still be one memory load or just reading the type id field of a union, and then many comparisons. But… wouldn’t a lookup table be faster?

Well, it turns out LLVM is pretty smart, and when it detects many comparisons it can sometimes build a lookup table for us. For this to work better, the numbers inside the lookup table must be close to each other (imagine a lookup table for the values 1 and 1000000, it would take a lot of space so LLVM would decide to do comparisons in that case). Luckily, we assign type ids in a way that helps LLVM achieve this.

When we say big unions chances are that that union contains classes of the same hierarchy: you usually build a class hierarchy to make all types follow a certain rule, respond to a similar set of methods. Although you can do this without a class hierarchy, they are a very common way of structuring code.

Consider this class hierarchy:

class Foo; end
class Bar < Foo; end
class Baz < Bar; end
class Qux < Bar; end

Considering these types only, the compiler assigns type ids in a post-order way: first Baz gets assigned 1, then Qux gets assigned 2, then Bar gets assigned 3, and finally Foo gets assigned 4. Also, the compiler tracks the range of type ids of a type’s subtypes, including itself, so for Bar it also assigns the range 1-3, and for Foo it assigns the range 1-4.

Now, consider this:

class Foo
  def foo
    1
  end
end

class Bar < Foo
  def foo
    2
  end
end

class Baz < Bar; end
class Qux < Bar; end

obj = # ... a union of the above types
obj.foo

First, the compiler will type obj as Foo+, meaning it can be Foo or one of its sublcasses (read more about this here). In this case, there will be only two different method instantiations: one for Foo+ and one for Bar+, since Baz and Qux don’t redefine that method. To know which one we need to call, we load the type id. Then, instead of having to say “if the type id is that of Bar, or Baz or Qux, call Bar#foo, otherwise call Foo#foo`, we can simply check if the type id is in the range previously assigned to Bar (1-3): just two comparisons.

This range check also works with is_a?. When you do obj.is_a?(Foo), and maybe obj is an Int32 or, Foo, or one of its subclasses, we can solve this with at most two comparisons.

Finally, an interesting aspect of Crystal is that method dispatch happens based on possibly many types:

def foo(x : Int32, y : Char)
end

def foo(x : Char, y : Int32)
end

def foo(x, y)
end

foo 1, 'a'

And this also works if the type of all arguments is not known at compile time. But… this blog post is getting a bit long and complex by now: there are many more micro-optimizations that we apply to your code to make it as efficient as possible. So don’t be afraid to use Crystal to its full potential :-)