Hey hey, I’m back! Been a while since I posted, but I’m ready to get back into things! Callisto has been through some big changes while this series was on hold, but nothing that affects backends too significantly. The main developer, yeti, has been hinting at some possible changes that do affect backends though, so I guess I should try to finish up this series before that.
So far in this series, we’ve looked at what Callisto is, and how its compiler and standard library are designed. With this background, it’s time to cover how I added ARM64 support to Callisto.
ARM Assembly Crash Course
I won’t go through everything there is to know about the ARM64 architecture here, but I do want to cover most of the features that I use. That way, this post should hopefully be somewhat understandable, even if you don’t have experience with ARM64.
Registers
Most of the ARM64 registers are general purpose, and don’t have any special behaviour, at least when it comes to assembly instructions. There is however a standard calling convention which defines which registers can be overwritten when calling functions, and this is important to keep in mind when calling external functions from Callisto.
Registers | Usage |
---|---|
x0 - x7 | Arguments for external functions |
x8 | Indirect return address (unused here) |
x9 - x15 | Local variables, functions may overwrite |
x16 - x18 | Special purpose (unused here) |
x19 - x28 | Functions must not overwrite |
x29 | Frame pointer (unused here) |
x30 (or lr) | Link register, contains the return address when calling a function |
sp | System stack pointer |
xzr | Always zero |
Accessing these registers as 32-bit values is done by swapping the x
for a w
Instructions
Here are the main instructions that are used throughout the ARM64 backend:
Instruction | Description |
---|---|
mov xa, xb |
Sets register xa to the contents of register xb . |
mov xa, value |
Sets xa to a given 16-bit value. |
ldr xa, =value |
Sets xa to a given 64-bit value (by loading from memory). |
ldr xa, [xb] |
Load xa from the address stored in xb . |
str xa, [xb] |
Stores the value in xa to the address in xb . |
add xa, xb, xc |
Adds xb and xc , storing the result in xc . |
sub xa, xb, xc |
Likewise for subtraction. |
b label |
Jumps to the label label . |
beq label |
Jumps to label if the previous comparison was equal. |
bne label |
Jumps to label if the previous comparison was not equal. |
bl label |
Jumps to label , saving the return address to lr . |
ret |
Returns to the address in lr . |
There are more that will pop up throughout, but these make up the majority of the assembly output.
Stacks
Callisto is a stack-oriented language, so it will be useful to understand how to work with stacks. Some architectures have dedicated push
and pop
instructions that always operate on a hardware stack (including 32-bit ARM), but this is not the case on ARM64.
Instead, stack operations should make use of additional features in the ldr
and str
instructions. These are the pre-indexed and post-indexed modes. These modes allow you to use a single ldr
or str
instruction to both write a value to memory, and update the value in the register used as a memory address.
Here are some examples of how they look:
; Store x0 to the address in x1, then add 8 to x1.
str x0, [x1], #8
; Add 8 to x3, then store x2 to the new address in x3.
str x2, [x3, #8]
; Subtract 8 from x5, then load x4 from the new address in x5.
ldr x4, [x5, #-8]
; Add 16 to sp, then load x8 from the new address.
ldr x8, [sp, #16]
The advantage of these instructions is that you can choose to treat any register as a stack pointer, not just the special sp
register.
Picking Registers
Callisto requires two stacks in each backend. One of these is used as the main working stack, where function arguments are taken from and return values are placed, and the other is used to store return addresses and local variables. In the x86_64 backend, r15
is used as a pointer for the working stack, while the dedicated rsp
register is used for the return/local stack.
I’ll mirror this in ARM, and use the first callee-saved register, x19
to index the working stack, sp
for the returns and locals, and x9
-x15
can be temporary registers, though I won’t use them all.
This should work well, as x19
will never be affected even by calls to C functions, thanks to its callee-saved nature in the system calling convention, and sp
was designed for exactly this purpose – that’s what C code will do too, so I shouldn’t run into any issues, right? Right?
The Backend
The first step to supporting a new CPU architecture in Callisto is to add a new backend. This involves creating a new subclass of CompilerBackend
, and adding a new option to the -b
flag in app.d to create this subclass:
// backends/arm64.d
import callisto.compiler;
class BackendARM64 : CompilerBackend {
}
// app.d
// ...
case "arm64": {
backend = new BackendARM64();
break;
}
// ...
The backend class needs a handful of methods to define basic properties of the target. Some are very straightforward, such as MaxInt
, which defines the largest value supported by the target, and DefaultHeader
, which adds output to the start of the assembly (which we don’t need here).
override long MaxInt() => -1; // -1 will be 64 ones in binary.
override string DefaultHeader() => "";
Along with these, we have GetVersions
, which returns an array of the supported features of this target. The complete list depends on the target OS – when targeting bare-metal devices, for instance, there won’t be any built in file support:
override string[] GetVersions() {
// CPU features
string[] ret = ["arm64", "LittleEndian", "16Bit", "32Bit", "64Bit"];
// OS features
switch (os) {
case "linux": {
ret ~= ["Linux", "IO", "File", "Args", "Time", "Heap", "Exit"];
break;
}
default: break;
}
return ret;
}
The last notable configuration function is FinalCommands
, where we define the commands to run after generating the output assembly. There are four commands to run:
- Move the generated code to a specific file:
mv output output.asm
- Assemble the code to an object file:
as output.asm -o output.o
- Link the object file into a binary:
ld output.o -o output
- Remove the temporary files:
rm output.asm output.o
The file names have to be updated according to the actual output name given to the compiler, and some of these commands may end up with additional arguments in some cases. FinalCommands
checks the compiler options and builds an array of strings containing the commands to run:
override string[] FinalCommands() {
bool isCross = executeShell("which aarch64-linux-gnu-as").status == 0;
string assembler = isCross ? "aarch64-linux-gnu-as" : "as";
string linker = isCross ? "aarch64-linux-gnu-ld" : "ld";
string[] ret = [
format("mv %s %s.asm", compiler.outFile, compiler.outFile),
];
string assembleCommand = format(
"%s %s.asm -o %s.o", assembler, compiler.outFile, compiler.outFile,
);
if (useDebug) {
assembleCommand ~= " -g";
}
ret ~= assembleCommand;
string linkCommand = format(
"%s %s.o -o %s", linker, compiler.outFile, compiler.outFile
);
foreach (ref lib ; link) {
linkCommand ~= format(" -l%s", lib);
}
if (useLibc) {
string[] possiblePaths = [
"/usr/aarch64-linux-gnu/lib/crt1.o",
"/usr/lib/crt1.o",
"/usr/lib64/crt1.o",
];
bool crt1;
foreach (ref path ; possiblePaths) {
if (path.exists) {
crt1 = true;
linkCommand ~= format(" %s", path);
linkCommand ~= format(" %s/crti.o", path.dirName);
linkCommand ~= format(" %s/crtn.o", path.dirName);
break;
}
}
if (!crt1) {
stderr.writeln("WARNING: Failed to find crt1.o, program may behave incorrectly");
}
}
ret ~= linkCommand;
if (!keepAssembly) {
ret ~= format("rm %s.asm %s.o", compiler.outFile, compiler.outFile);
}
return ret;
}
A Map of the Program
The compilation process is driven by compiler.d, specifically in the Compile
method.
void Compile(Node[] nodes) {
// ...
backend.Init();
// ...
Node[] header;
Node[] main;
foreach (ref node ; nodes) {
switch (node.type) {
case NodeType.FuncDef:
case NodeType.Include:
case NodeType.Let:
case NodeType.Enable:
case NodeType.Requires:
case NodeType.Struct:
case NodeType.Const:
case NodeType.Enum:
case NodeType.Union:
case NodeType.Alias:
case NodeType.Extern:
case NodeType.Implement: {
header ~= node;
break;
}
default: main ~= node;
}
}
foreach (ref node ; header) {
CompileNode(node);
}
backend.BeginMain();
foreach (ref node ; main) {
CompileNode(node);
}
backend.End();
}
There are five main sections to the compiler output:
Init
generates the code that runs first when the program is executed.- Definitions (or the
header
) are compiled. This includes both function and type definitions. BeginMain
generates the setup code for the Callisto main function.- All code outside of the previously compiled definitions is compiled, making up the main function.
End
generates the final code to be run at the end of the main function.
The first thing to look at here are the three special functions: Init
, BeginMain
, and End
, as they are responsible for making the compiled program work at all.
Getting Started
Everything starts somewhere, and when you’re creating executables with ld
, that somewhere is _start
. In the Callisto compiler, the Init
generates this _start
function, with code to prepare the stack and perform other setup.
override void Init() {
// ...
output ~= ".text\n";
if (useLibc) {
output ~= ".global main\n";
output ~= "main:\n";
} else {
output ~= ".global _start\n";
output ~= "_start:\n";
}
output ~= "bl __init\n";
// allocate data stack
output ~= "sub sp, sp, #4096\n";
output ~= "mov x19, sp\n";
// jump to main
output ~= "b __calmain\n";
// create functions for interop
if (exportSymbols) {
output ~= "
.global cal_push
cal_push:
str x0, [x19], #8
ret
.global cal_pop
cal_pop:
ldr x0, [x19, #-8]!
ret
";
}
}
The generated assembly does a few things:
- Calls a
__init
function, to be defined later. - Allocates 4096 bytes, or 512 Callisto “cells” (64-bit integers) of stack space for the Callisto working stack.
- Jumps to the
__calmain
label, where execution of Callisto code begins. - If requested by the user, the init section also contains helper functions to push and pop values from the Callisto stack – allowing integration with another language.
You’ll also notice that when configured to link with libc, the code actually starts from main
, not _start
. The C runtime has its own version of _start
, which it uses for initialisation, and it will call main
for us.
BeginMain
, the function where __calmain
is actually defined, is a fair bit simpler.
override void BeginMain() {
output ~= "__calmain:\n";
// call constructors
foreach (name, global ; globals) {
if (global.type.hasInit) {
output ~= format("ldr x9, =__global_%s\n", name.Sanitise());
output ~= "str x9, [x19], #8\n";
output ~= format("bl __type_init_%s\n", global.type.name.Sanitise());
}
}
}
It creates a __calmain
label – the main code within this function will be defined later when Compile
compiles all of the main syntax nodes. By the time BeginMain
is called, globals and custom types will have been defined, so it also generates calls to initialisation functions for any globals that have them. The address of a global is placed on the stack, and then the init function is called.
Getting Finished
After emitting code for all of the syntax nodes, End
is responsible for cleaning up before exiting, as well as defining storage in the correct executable sections for all the global variables used in the program. Let’s break this one down bit by bit.
First up, it generates code to deconstruct all of the global variable in the program, just like the init code in BeginMain
:
override void End() {
// call destructors
foreach (name, global ; globals) {
if (global.type.hasDeinit) {
output ~= format("ldr x9, =__global_%s\n", name.Sanitise());
output ~= "str x9, [x19], #8\n";
output ~= format("bl __type_deinit_%s\n", global.type.name.Sanitise());
}
}
It then looks for a specially named word/function to run at the end of the program. Without this, the program would actually not exit correctly (unless using libc), because we need to make a system call to exit safely.
This part of End
is also where the __init
function mentioned earlier is defined. It uses the exact same system, looking for a specific word to have been defined. This wasn’t possible back in Init
, as the function definitions hadn’t yet been processed.
// exit program
if ("__arm64_program_exit" in words) {
CallFunction("__arm64_program_exit");
}
else {
WarnNoInfo("No exit function available, expect bugs");
}
output ~= "ret\n";
// run init function
output ~= "__init:\n";
if ("__arm64_program_init" in words) {
CallFunction("__arm64_program_init");
}
else {
WarnNoInfo("No program init function available");
}
output ~= "ret\n";
All of the code has been generated by this point, and the only thing left to do is set up the program’s data. First, we’ll output a bss
section, which is used for uninitialised data. In it, each global variable is defined with a .lcomm
directive, telling the assembler how much space to reserve in the section for this global.
// create global variables
output ~= ".bss\n";
foreach (name, var ; globals) {
output ~= format(".lcomm __global_%s, %d\n", name.Sanitise(), var.Size());
if (exportSymbols) {
output ~= format(".global __global_%s\n", name.Sanitise());
}
}
Finally, we generate code to set up the arrays. Arrays are special, because unlike global variables, they contain initial data that needs to be stored in the executable.
Each array gets a label in the .data
section in the form __array_0
, and the array data is placed after a .byte
or similar directive:
output ~= ".data\n";
foreach (i, ref array ; arrays) {
output ~= ".align 8\n";
if (exportSymbols) {
output ~= format(".global __array_%d\n", i);
}
output ~= format("__array_%d: ", i);
switch (array.type.size) {
case 1: output ~= ".byte "; break;
case 2: output ~= ".2byte "; break;
case 4: output ~= ".4byte "; break;
case 8: output ~= ".8byte "; break;
default: assert(0);
}
foreach (j, ref element ; array.values) {
output ~= element ~ (j == array.values.length - 1? "" : ", ");
}
output ~= '\n';
Global arrays also have a struct automatically created and filled with the details of the array.
if (array.global) {
output ~= ".align 8\n";
output ~= format(
"__array_%d_meta: .8byte %d, %d, __array_%d\n",
i,
array.values.length,
array.type.size,
i
);
}
}
Nodes and Types
It’s finally time to start compiling AST nodes! 🎉
These nodes are passed from the Compiler
class to our CompilerBackend
, calling a different function for each type of node.
I’ll start with some nodes that don’t actually produce any assembly output. CompileConst
is one such node, as it just defines a constant that other code can reference later:
override void NewConst(string name, long value, ErrorInfo error = ErrorInfo.init) {
consts[name] = Constant(new IntegerNode(error, value));
}
override void CompileConst(ConstNode node) {
if (node.name in consts) {
Error(node.error, "Constant '%s' already defined", node.name);
}
NewConst(node.name, node.value);
}
NewConst
is useful because it allows other backend components to define constants directly, such as in CompileEnum
, which creates a copy of an existing numeric type, and associated constants:
override void CompileEnum(EnumNode node) {
if (!TypeExists(node.enumType)) {
Error(node.error, "Enum base type '%s' doesn't exist", node.enumType);
}
if (TypeExists(node.name)) {
Error(node.error, "Enum name is already used by type '%s'", node.enumType);
}
auto baseType = GetType(node.enumType);
baseType.name = node.name;
types ~= baseType;
foreach (i, ref name ; node.names) {
NewConst(format("%s.%s", node.name, name), node.values[i]);
}
NewConst(format("%s.min", node.name), node.values.minElement());
NewConst(format("%s.max", node.name), node.values.maxElement());
NewConst(format("%s.sizeof", node.name), GetType(node.name).size);
}
Another couple of nodes for defining types are CompileUnion
, which defines a type like a C union (though all it really does here is give the type the maximum size of the child types), and CompileAlias
, which defines a new name for an existing type.
override void CompileUnion(UnionNode node) {
size_t maxSize = 0;
if (TypeExists(node.name)) {
Error(node.error, "Type '%s' already exists", node.name);
}
string[] unionTypes;
foreach (ref type ; node.types) {
if (unionTypes.canFind(type)) {
Error(node.error, "Union type '%s' defined twice", type);
}
unionTypes ~= type;
if (!TypeExists(type)) {
Error(node.error, "Type '%s' doesn't exist", type);
}
if (GetType(type).size > maxSize) {
maxSize = GetType(type).size;
}
}
types ~= Type(node.name, maxSize);
NewConst(format("%s.sizeof", node.name), cast(long) maxSize);
}
override void CompileAlias(AliasNode node) {
if (!TypeExists(node.from)) {
Error(node.error, "Type '%s' doesn't exist", node.from);
}
if ((TypeExists(node.to)) && !node.overwrite) {
Error(node.error, "Type '%s' already defined", node.to);
}
auto baseType = GetType(node.from);
baseType.name = node.to;
types ~= baseType;
NewConst(format("%s.sizeof", node.to), cast(long) GetType(node.to).size);
}
The last node for defining types is CompileStruct
, which defines a compound structure that can hold multiple other values inside. Currently, using structures in Callisto is still a bit of a manual process – struct
definitions are just a way to create a type with the right size and some helper constants.
Now’s a good time to mention that for longer functions, I’ll omit some of the error handling code, to focus on the parts that really matter.
override void CompileStruct(StructNode node) {
size_t offset;
StructEntry[] entries;
// Used for checking duplicate names
string[] members;
// Copy the inherited fields
if (node.inherits) {
entries = GetType(node.inheritsFrom).structure;
foreach (ref member ; GetType(node.inheritsFrom).structure) {
members ~= member.name;
}
}
// Add the new fields
foreach (ref member ; node.members) {
entries ~= StructEntry(
GetType(member.type), member.name, member.array, member.size
);
members ~= member.name;
}
// Define constants for each field's offset
foreach (ref member ; entries) {
NewConst(format("%s.%s", node.name, member.name), offset);
offset += member.array? member.type.size * member.size : member.type.size;
}
NewConst(format("%s.sizeof", node.name), offset);
types ~= Type(node.name, offset, true, entries);
}
If you’re looking at these and thinking “Gee, these don’t look very backend-specific, couldn’t they be implemented in one place for all backends?”, you’re right! In the latest version of the compiler, these no longer live in arm64.d, but in compiler.d. For consistency though, I’m sticking with b0.9.0 through this series.
Nodes and Codes
Now, we can look at the nodes that emit real assembly. I’ll start with some simple ones. First, there’s CompileInteger
, responsible for compiling integer literals to be pushed onto the stack:
override void CompileInteger(IntegerNode node) {
output ~= format("ldr x9, =%d\n", node.value);
output ~= "str x9, [x19], #8\n";
}
CompileBreak
and CompileContinue
implement the usual break
and continue
control flow commands. while
loops use labels that are distinguished with unique numbers, so these nodes can just compile to branches to those labels.
override void CompileBreak(WordNode node) {
if (!inWhile) {
Error(node.error, "Not in while loop");
}
output ~= format("b __while_%d_end\n", currentLoop);
}
override void CompileContinue(WordNode node) {
if (!inWhile) {
Error(node.error, "Not in while loop");
}
output ~= format("b __while_%d_next\n", currentLoop);
}
Callisto has a built in call
word for calling a function pointer, and it’s similarly trivial to implement – pop a value from the stack, and branch there. It does require a new instruction I hadn’t mentioned: blr
is a version of bl
that takes the branch address from a register.
override void CompileCall(WordNode node) {
output ~= "ldr x9, [x19, #-8]!\n";
output ~= "blr x9\n";
}
Init/Deinit
I said that Callisto structs don’t provide much beyond a correctly sized value and some constants, but there is one more thing, and it’s important to understand before getting to many of the block nodes.
Each struct can have an associated init
and deinit
word defined, and these will be called, with the address of the struct as an argument, at the appropriate time. init
will be called either when a local is defined with let
, or at the start of the program for globals, which we saw in BeginMain
.
deinit
is more complex. For globals, there’s an obvious time to call it – at the end of the program. For locals though, deinit
has to be called whenever the local would go out of scope. This can happen in any block structure, as well as in an early return from a function. That means that the code to deinit structs will show up a few times in the backend.
The exact code differs slightly between functions, but generally it involves iterating over each variable in the current scope and checking if a deinit
implementation was provided. Then, if there was a deinit
word, we compute and push the address of the struct, and call deinit
.
Variables
The first function to look at in relation to variables is CompileLet
. In Callisto, both local and global variables are defined using let type var
, and this let
node is responsible for creating the appropriate space for the variable, as well as initialising it. CompileLet
has two parts, one for handling globals, and one for locals. Here’s the global part:
override void CompileLet(LetNode node) {
if (inScope) {
// Take care of locals
}
else {
Global global;
global.type = GetType(node.varType);
global.array = node.array;
global.arraySize = node.arraySize;
globals[node.name] = global;
}
}
Okay, that’s pretty straightforward. We’ve already seen the actual allocation, initialisation and deinitialisation of globals, in BeginMain
and End
. So what about locals?
if (inScope) {
Variable var;
var.name = node.name;
var.type = GetType(node.varType);
var.offset = 0;
var.array = node.array;
var.arraySize = node.arraySize;
foreach (ref ivar ; variables) {
ivar.offset += var.Size();
}
variables ~= var;
switch (var.Size()) {
case 1: output ~= "strb wzr, [sp, #-1]!\n"; break;
case 2: output ~= "strh wzr, [sp, #-2]!\n"; break;
case 4: output ~= "str wzr, [sp, #-4]!\n"; break;
case 8: output ~= "str xzr, [sp, #-8]!\n"; break;
default: OffsetLocalsStack(var.Size(), true);
}
if (var.type.hasInit) {
output ~= "str sp, [x19], #8\n";
output ~= format("bl __type_init_%s\n", var.type.name.Sanitise());
}
}
Not that much worse, actually. The new things that have to be done for locals are:
- Update each existing variable’s offset. This is required because locals are accessed relative to the stack pointer, which we’re about to move.
- Allocate the stack space. If this is a small value, this is done by pushing zero to the stack, otherwise by calling
OffsetLocalsStack
(see below), which doesn’t zero initialise. - Here’s where
init
comes in: if the variable’s type has an initialiser, call it by pushing the variable’s address and branching.
The OffsetLocalsStack
function handles a few possible cases for adjusting the value of sp
. It allows offsetting in either direction, using add
or sub
accordingly, and accounts for the fact that those instructions can only take a 12-bit immediate value by loading up to 16 bits with a mov
.
private void OffsetLocalsStack(size_t offset, bool sub) {
if (offset >= 4096) {
output ~= format("mov x9, #%d\n", offset);
output ~= format("%s sp, sp, x9\n", sub ? "sub" : "add");
} else {
output ~= format("%s sp, sp, #%d\n", sub ? "sub" : "add", offset);
}
}
Now that we have a way to define variables, how about a way to give them new values? Callisto uses -> var
to do this, and this is compiled by the CompileSet
method. Again, this is split between locals and globals, by searching for the given name in each set of variables.
In both cases, the first step is to fetch the top item of the stack into a register. For locals, we use the variable offset to store that register relative to the stack pointer. For globals, because ARM doesn’t support storing to an absolute address, we load the address using ldr
first.
override void CompileSet(SetNode node) {
output ~= "ldr x9, [x19, #-8]!\n";
if (VariableExists(node.var)) {
auto var = GetVariable(node.var);
switch (var.type.size) {
case 1: output ~= format("strb w9, [sp, #%d]\n", var.offset); break;
case 2: output ~= format("strh w9, [sp, #%d]\n", var.offset); break;
case 4: output ~= format("str w9, [sp, #%d]\n", var.offset); break;
case 8: output ~= format("str x9, [sp, #%d]\n", var.offset); break;
default: Error(node.error, "Bad variable type size");
}
}
else if (node.var in globals) {
auto global = globals[node.var];
output ~= format("ldr x10, =__global_%s\n", node.var.Sanitise());
switch (global.type.size) {
case 1: output ~= "strb w9, [x10]\n"; break;
case 2: output ~= "strh w9, [x10]\n"; break;
case 4: output ~= "str w9, [x10]\n"; break;
case 8: output ~= "str x9, [x10]\n"; break;
default: Error(node.error, "Bad variable type size");
}
}
else {
Error(node.error, "Variable '%s' doesn't exist", node.var);
}
}
Another useful operation on variables is to get their address – that is, to create a pointer to them. In Callisto, this is done with the C-like &var
, handled by CompileAddr
. You can also take the address of a function, which is how function pointers for call
are created.
override void CompileAddr(AddrNode node) {
if (node.func in words) {
auto word = words[node.func];
string symbol = word.type == WordType.Callisto?
format("__func__%s", node.func.Sanitise()) : node.func;
output ~= format("ldr x9, =%s\n", symbol);
output ~= "str x9, [x19], #8\n";
}
else if (node.func in globals) {
auto var = globals[node.func];
output ~= format("ldr x9, =__global_%s", node.func.Sanitise());
output ~= "str x9, [x19], #8\n";
}
else if (VariableExists(node.func)) {
auto var = GetVariable(node.func);
output ~= format("add x9, sp, #%d\n", var.offset);
output ~= "str x9, [x19], #8\n";
}
else {
Error(node.error, "Undefined identifier '%s'", node.func);
}
}
You might have noticed that there’s one key operation with variables that’s missing so far – reading them. That’s intentional, because the syntax for getting a variable’s value is just var
, which is the same as any other word. I’ll get into how plain words are compiled later, so you’ll have to wait to see how that works.
Variables × A Lot
Are you tired of creating 100 variables with names like thing1
, thing2
, thing3
, and so on? Do you want to access hundreds, thousands, MILLIONS of items through one easy name? Do you want a nice way to store text, instead of just numbers? Then have I got the product for you!
Callisto does of course have arrays, as well as strings, which are just arrays of bytes with a fancy syntax. I’ve already shown part of how they work, with End
being responsible for inserting array data into the assembly output. CompileArray
is responsible for converting the array syntax node to an Array
definition, and generating the code to set up the array and put its address on the stack.
First, it gathers up all of the array’s contents (which must be integers) into a new Array
object, and adds it to arrays
for End
to take care of:
override void CompileArray(ArrayNode node) {
Array array;
foreach (ref elem ; node.elements) {
switch (elem.type) {
case NodeType.Integer: {
auto node2 = cast(IntegerNode) elem;
array.values ~= node2.value.text();
break;
}
default: {
Error(elem.error, "Type '%s' can't be used in array literal");
}
}
}
array.type = GetType(node.arrayType);
array.global = !inScope || node.constant;
arrays ~= array;
If the array is a global (including if it’s a constant array), then putting it onto the stack is easy. End
sets up a static struct in memory for the array, and all we need to do is push its address to the stack.
if (!inScope || node.constant) {
output ~= format("ldr x9, =__array_%d_meta\n", arrays.length - 1);
output ~= "str x9, [x19], #8\n";
}
Otherwise, things are a little more complex. Because we need the ability to modify the array, not just read from it, the array set up by End
won’t be enough here. There are two steps to this process. First, we copy the contents of the array onto the locals stack:
else {
OffsetLocalsStack(array.Size(), true);
output ~= "mov x9, sp\n";
output ~= format("ldr x10, =__array_%d\n", arrays.length - 1);
output ~= format("ldr x11, =%d\n", array.Size());
output ~= "1:\n";
output ~= "ldrb w12, [x10], #1\n";
output ~= "strb w12, [x9], #1\n";
output ~= "subs x11, x11, #1\n";
output ~= "bne 1b\n";
The equivalent of this assembly on x86_64 uses a built in rep movsb
instruction to copy a large block of data. There’s no such assembly shortcut on ARM, so here I just implemented a small loop to copy the array data. There are a couple of interesting new assembly tricks here. First, the loop itself uses local labels. This means that I can output this exact block of assembly many times, without conflict. 1b
refers to the previous use of the label 1:
(1f
would refer to the next use). Second, to decrement x11
and then jump back only if it’s still positive, I used subs
. This modifies the sub
instruction to also set the flags according to its output, so the loop can keep jumping until that output is zero.
After this, the code sets up the definitions of both the raw data and the Array
struct:
Variable var;
var.type = array.type;
var.offset = 0;
var.array = true;
var.arraySize = array.values.length;
foreach (ref var2 ; variables) {
var2.offset += var.Size();
}
variables ~= var;
// create metadata variable
var.type = GetType("Array");
var.offset = 0;
var.array = false;
foreach (ref var2 ; variables) {
var2.offset += var.Size();
}
variables ~= var;
and fills in the struct fields, before pushing its address to the stack:
output ~= "mov x9, sp\n";
output ~= format("sub sp, sp, #%d\n", var.type.size);
output ~= format("ldr x10, =%d\n", array.values.length);
output ~= "str x10, [sp]\n";
output ~= format("ldr x10, =%d\n", array.type.size);
output ~= "str x10, [sp, #8]\n";
output ~= "str x9, [sp, #16]\n";
// push metadata address
output ~= "mov x9, sp\n";
output ~= "str x9, [x19], #8\n";
}
}
As strings are just a special syntax for arrays, CompileString
is a very simple function. All it needs to do is convert the StringNode
into an ArrayNode
.
override void CompileString(StringNode node) {
auto arrayNode = new ArrayNode(node.error);
arrayNode.arrayType = "u8";
arrayNode.constant = node.constant;
foreach (ref ch ; node.value) {
arrayNode.elements ~= new IntegerNode(node.error, cast(long) ch);
}
CompileArray(arrayNode);
}
Take Control
Only a few more nodes left to look at! Let’s have a look at the control flow nodes. if
and while
actually have a fair bit in common, as they both involve setting up labels in order to conditionally jump around some other piece of code. They also both introduce new scopes and therefore need to take care of calling deinit
when ready.
I’ll look at CompileIf
first. An if
statement has a series of conditions and associated blocks of code to run when the conditions are true. It can also have an else
block, to run when no conditions are true.
First, we need a unique ID to use for the labels. Each control flow structure is assigned the next number, stored in a member variable of our backend:
++ blockCounter;
auto blockNum = blockCounter;
uint condCounter;
For each pair of condition and body in the if statement, we start by compiling the nodes that make up the condition. After these nodes run, we expect a result to be left on the stack – 0 means false, anything else means true. This is why after the condition, we compile a comparison and conditional jump. The jump will skip over the body of the statement whenever the result is zero by jumping to the label with the next higher index.
foreach (i, ref condition ; node.condition) {
foreach (ref inode ; condition) {
compiler.CompileNode(inode);
}
output ~= "ldr x9, [x19, #-8]!\n";
output ~= "cmp x9, #0\n";
output ~= format("beq __if_%d_%d\n", blockNum, condCounter + 1);
Assembly that gets output here is part of the body of the if statement, and that means it starts a new scope. To take care of this, CompileIf
takes a copy of the current variables before compiling any of the body. It can then reference this copy at the end in order to only deinitialise the newly added variables, and move the stack pointer back to its original place..
auto oldVars = variables.dup;
auto oldSize = GetStackSize();
foreach (ref inode ; node.doIf[i]) {
compiler.CompileNode(inode);
}
foreach (ref var ; variables) {
if (oldVars.canFind(var)) continue;
if (!var.type.hasDeinit) continue;
output ~= format("add x9, sp, #%d\n", var.offset);
output ~= "str x9, [x19], #8\n";
output ~= format("bl __type_deinit_%s\n", var.type.name.Sanitise());
}
if (GetStackSize() - oldSize > 0) {
OffsetLocalsStack(GetStackSize() - oldSize, false);
}
variables = oldVars;
After the body, we don’t want to keep checking the other conditions, so we’ll emit a jump to the end of the entire if statement. This is also where the internal __if_1_1
type labels are created, so that each failing condition can jump over its associated body.
output ~= format("b __if_%d_end\n", blockNum);
++ condCounter;
output ~= format("__if_%d_%d:\n", blockNum, condCounter);
}
Finally, if this particular if statement has an else
block, it gets compiled, then the __if_1_end
label is created. This works the same way as the previous bodies, so I won’t show it in full.
if (node.hasElse) {
// Save the variables, compile the body, restore the stack
}
output ~= format("__if_%d_end:\n", blockNum);
CompileWhile
works quite similarly to CompileIf
, so here’s a condensed version. Comments in {}
indicate sections of code you’ve already seen:
// {Get block ID}
// Save the block ID for break/continue to use
currentLoop = blockNum;
// The condition is placed after the loop, so jump there first.
output ~= format("b __while_%d_condition\n", blockNum);
output ~= format("__while_%d:\n", blockNum);
// {Save scope}
foreach (ref inode ; node.doWhile) {
// Mark that we're in a loop so break/continue will be allowed.
inWhile = true;
compiler.CompileNode(inode);
currentLoop = blockNum;
}
output ~= format("__while_%d_next:\n", blockNum);
// {Restore scope}
inWhile = false;
output ~= format("__while_%d_condition:\n", blockNum);
foreach (ref inode ; node.condition) {
compiler.CompileNode(inode);
}
// Conditional branching, just like for `if`, but at the end.
output ~= "ldr x9, [x19, #-8]!\n";
output ~= "cmp x9, #0\n";
output ~= format("bne __while_%d\n", blockNum);
output ~= format("__while_%d_end:\n", blockNum);
And that’s it for control flow – right now, Callisto only features those two structures.
Words, Words, Words
Finally, the core of Callisto. So far, the backend can compile some data types, literal values, and some control structures. Everything else in Callisto is a word, so the ARM backend better be able to compile words – both their definitions and their usage.
I’ll go over a slightly simplified version of these compiler functions here, but as always, the Callisto source code is available to review in full. For simplicity, I’ll be omitting the code to handle inline words (words that compile their bodies directly where they are used), and the code relating to error handling, both in the compiler, and at runtime (Callisto exceptions).
New words/functions are defined with func
, and compiled using CompileFuncDef
. The start of this function is pretty straightforward, making sure that the backend state is set correctly, and setting up labels with the right names.
override void CompileFuncDef(FuncDefNode node) {
thisFunc = node.name;
assert(!inScope);
inScope = true;
words[node.name] = Word(
node.raw? WordType.Raw : WordType.Callisto, false, [], node.errors
);
// Raw words are useful for exporting from the
string symbol =
node.raw? node.name : format("__func__%s", node.name.Sanitise());
if (exportSymbols) {
output ~= format(".global %s\n", symbol);
}
output ~= format("%s:\n", symbol);
The next step is something that doesn’t exist in the x86 backend. On x86, the call
instruction does two things. First, it pushes the current program counter to the system stack, then it jumps to the given address. Keeping the return address on the stack means that nested calls work perfectly fine – each pushes its own return address, then the corresponding ret
will later pop it from the stack.
On ARM, it’s not quite so straightforward. The equivalent of call
is bl
, or Branch and Link. Rather than saving the return address to the stack, bl
saves it to the register lr
. If you try to use bl
on its own, then nested function calls won’t work at all because lr
can only store a single address. So to allow proper nested words, CompileFuncDef
will insert an instruction to push lr
to the stack here. Part of the motivation for making this a separate step in the ARM architecture is so that leaf functions – functions that don’t call other functions – can skip it, but my ARM64 Callisto backend doesn’t do so yet.
output ~= "str lr, [sp, #-8]!\n";
Function parameters are next. In Callisto b0.9.0, the version I’m looking at here, these were an optional feature, though in the latest version, they are almost always needed to work with the stack checker. Parameters are handled by creating associated variable definitions, then copying data directly from the working stack to the system/locals stack, using a copy loop like the one in CompileArray
.
size_t paramSize = node.params.length * 8;
if (paramSize > 0) {
output ~= format("sub sp, sp, #%d\n", paramSize);
foreach (ref var ; variables) {
var.offset += paramSize;
}
size_t offset;
foreach (i, ref type ; node.paramTypes) {
auto param = node.params[i];
Variable var;
var.name = param;
var.type = GetType(type);
var.offset = cast(uint) offset;
offset += var.Size();
variables ~= var;
}
output ~= format("sub x19, x19, #%d\n", paramSize);
output ~= "mov x9, x19\n";
output ~= "mov x10, sp\n";
output ~= format("mov x11, #%d\n", paramSize);
output ~= "1:\n";
output ~= "ldrb w12, [x9], #1\n";
output ~= "strb w12, [x10], #1\n";
output ~= "subs x11, x11, #1\n";
output ~= "bne 1b\n";
}
Now that the stack has been set up correctly, we can go right ahead and compile all of the function body, and the variable deinit code after that. This version of the deinitialisation loop is slightly different, though. In Callisto, functions are always the topmost scope that local variables can appear in, so there was no need to save the old variable definitions. Instead, wiping out every variable will suffice.
foreach (ref inode ; node.nodes) {
compiler.CompileNode(inode);
}
size_t scopeSize;
foreach (ref var ; variables) {
scopeSize += var.Size();
if (var.type.hasDeinit) {
output ~= format("add x9, sp, #%d\n", var.offset);
output ~= "str x9, [x19], #8\n";
output ~= format("bl __type_deinit_%s\n", var.type.name.Sanitise());
}
}
if (scopeSize > 0) {
OffsetLocalsStack(scopeSize, false);
}
These last few lines just reverse a few actions taken earlier – the link register is restored, we use it to return to the caller, and then clean up the backend state.
output ~= "ldr lr, [sp], #8\n";
output ~= "ret\n";
inScope = false;
variables = [];
}
CompileImplement
is similar to CompileFuncDef
, but without the code for handling function parameters and the code for handling exceptions (which I didn’t include anyway). This is the function that compiles init
and deinit
definitions, from implement
blocks. They are just regular functions, but with a different naming convention to avoid conflicts.
CompileReturn
essentially just contains the end of CompileFuncDef
, to allow inserting early returns into functions. It needs to call deinit
, just like the end of a function, so that’s included. inScope
and variables
are left as is, because we are still compiling the function body at this point.
override void CompileReturn(WordNode node) {
if (!inScope) {
Error(node.error, "Return used outside of function");
}
size_t scopeSize;
foreach (ref var ; variables) {
scopeSize += var.Size();
if (var.type.hasDeinit) {
output ~= format("add x9, sp, #%d\n", var.offset);
output ~= "str x9, [x19], #8\n";
output ~= format("bl __type_deinit_%s\n", var.type.name.Sanitise());
}
}
if (scopeSize > 0) {
OffsetLocalsStack(scopeSize, false);
}
output ~= "ldr lr, [sp], #8\n";
output ~= "ret\n";
}
The last of these functions to look at is CompileWord
. Since almost everything in Callisto is a word, this is a pretty important function. It’s also a pretty long function, because there are four different cases to take care of – a word can be:
- A word: It might be clearer in this context to call this a function. For these, we’ll need to jump to the corresponding piece of code.
- A local variable: If the word matches a local in the current scope, it should fetch the value of that local.
- A global variable: Similar to the previous, but the code used to load the variable is a bit different.
- A constant: By far the simplest case, this just compiles to the value of the constant, which right now is always an
IntegerNode
.
Let’s break them down one by one. Keep in mind, this is still simplified code – no inlines, no exceptions. Handling Callisto words is pretty easy. Everything is stored on the working stack, so just branching to them is enough. Raw words skip the name mangling step, but a bl
is still all that gets emitted.
if (node.name in words) {
auto word = words[node.name];
if (word.type == WordType.Raw) {
output ~= format("bl %s\n", node.name);
}
else if (word.type == WordType.C) {
// ...
}
else {
output ~= format("bl __func__%s\n", node.name.Sanitise());
}
}
The C word type that I skipped over there takes a little more work. The C calling convention on ARM64 is actually the same across all platforms, because ARM themselves standardise it. It’s quite simple, especially when handling only a limited number of integer arguments, as Callisto supports currently. For this case, we just need to place arguments into registers x0
through x7
, and fetch the result from x0
after calling. C functions also expect to be able to use the stack, but this is fine as sp
will point past Callisto’s stack.
if (word.params.length > 8) {
Error(node.error, "C call has too many parameters");
}
for (auto i = 0; i < word.params.length; i++) {
auto reg = word.params.length - i - 1;
output ~= format("ldr x%d, [x19, #-8]!\n", reg);
}
output ~= format("bl %s\n", word.symbolName);
if (!word.isVoid) {
output ~= "str x0, [x19], #8\n";
}
For variables, both local and global, the code is pretty much the inverse of CompileSet
. This allows any variable’s name to be used to directly fetch its value onto the working stack.
else if (VariableExists(node.name)) {
auto var = GetVariable(node.name);
switch (var.type.size) {
case 1: output ~= format("ldrb w9, [sp, #%d]\n", var.offset); break;
case 2: output ~= format("ldrh w9, [sp, #%d]\n", var.offset); break;
case 4: output ~= format("ldr w9, [sp, #%d]\n", var.offset); break;
case 8: output ~= format("ldr x9, [sp, #%d]\n", var.offset); break;
default: Error(node.error, "Bad variable type size");
}
output ~= "str x9, [x19], #8\n";
}
else if (node.name in globals) {
auto var = globals[node.name];
output ~= format("ldr x9, =__global_%s\n", node.var.Sanitise());
switch (var.type.size) {
case 1: output ~= "ldrb w9, [x9]\n"; break;
case 2: output ~= "ldrh w9, [x9]\n"; break;
case 4: output ~= "ldr w9, [x9]\n"; break;
case 8: output ~= "ldr x9, [x9]\n"; break;
default: Error(node.error, "Bad variable type size");
}
output ~= "str x9, [x19], #8\n";
}
Finally, compiling constants is just as simple as described before – pass the value of the constant back into the compiler to handle.
else if (node.name in consts) {
auto value = consts[node.name].value;
value.error = node.error;
compiler.CompileNode(consts[node.name].value);
}
else {
Error(node.error, "Undefined identifier '%s'", node.name);
}
Now, (excluding the simplifications I made), this backend is capable of compiling all Callisto code to ARM64 assembly, then building an executable using the appropriate commands.
Facing Reality
I have a confession to make. I’ve been lying to you this whole time. At this point in developing the backend, I wasn’t working on real ARM64 hardware. The most convenient way to test all of this was using QEMU on an x86 Linux PC, so I only tested it all inside an emulator. Once everything was pretty much working, I tested on two real ARM64 devices – a Raspberry Pi 4, and an ARM64 MacBook running a Linux virtual machine – and it failed on both.
I already hinted at the reason for this back when I was describing the registers. The culprit was sp
. While sp
can often be used as a normal register, it has an extra requirement that was not being enforced by QEMU. As Jacob Bramley wrote on the ARM Community Blog:
For AArch64,
sp
must be 16-byte aligned whenever it is used to access memory. This is enforced by AArch64 hardware.
This is an issue in Callisto, because though it might be easy to push an additional 8 bytes when pushing lr
to the stack, locals each independently allocate stack space with different sizes, so that means a lot of additional stack alignment checks. I considered handling it this way for a while, but found a much simpler solution. Because ARM64 makes it so easy to use any register as a stack pointer, I just switched the register for Callisto’s return and local stack to x20
.
This actually introduced another subtle bug, which I didn’t notice until some time after I started writing this series. Because the Callisto return stack is no longer synced with the C one, calling a C function had the potential to overwrite part of Callisto’s used memory. It should only have affected the far end of the working stack, so was unlikely to come up in practice, but I fixed that with one extra instruction when calling C functions:
and sp, x20, ~0xf
This will reset sp
to the current return stack position, and take care of alignment at the same time, rounding down to the next 16-byte boundary. Now, with all of this sorted out, Callisto actually works!
Doing Useful Things
With all this work in the backend, what can Callisto on ARM64 do now? Well… nothing. Even for Callisto to perform arithmetic, there’s more work to be done. Callisto needs its core library implemented, split into an OS and a CPU component. In the next post, I’ll get into the implementation of this core on ARM64 Linux, so that Callisto is capable of doing anything interesting at all.
Until then, I hope you enjoyed reading another (mostly code) post about bringing an obscure language to a somewhat less obscure platform!