A type system for RCL

Part I

Introduction

written by Ruud van Asseldonk
published

I am building a new configuration language: RCL. From the start I intended it to have types, but initially it was implemented as a completely dynamic language. Now that I’m adding a typechecker, I thought it would be interesting to look at the type system and its implementation. The type system is by no means complete — in particular record types and importing types across files are not yet supported — but there is enough to fill a few posts. This introduction explores RCL and what problems a type system for RCL should and should not solve. In part two and three we’ll explore the type system itself and related work, and in part four we’ll dive into the internals of the typechecker.

My goal with this series is twofold:

In this series:

What is RCL?

Before I can explain the type system, let’s recap the language that we are typing. RCL is a superset of json that extends it into a simple functional language similar to Nix. Its goal is to enable abstraction and reuse in repetitive configuration such as CI workflows, deployment configuration such as Ansible playbooks and Kubernetes manifests, and infrastructure-as-code tools like OpenTofu. An RCL document consists of a single expression and can be exported to json, yaml, or toml. Any time somebody contemplates templating yaml, they should consider using RCL instead.

The data model is that of json plus sets and functions. Also, dictionary keys are not limited to strings, they can be any value. Aside from literals for these new values, RCL adds let-bindings, list comprehensions, and a few other features to json. Here is an example document:

let colors = ["blue", "green"];
let port_number = offset => 8000 + (100 * offset);
{
  deployments = [
    for i, color in colors.enumerate():
    {
      name = color,
      port = port_number(i),
    }
  ],
}

It evaluates to the following json document:

{
  "deployments": [
    {"name": "blue", "port": 8000},
    {"name": "green", "port": 8100}
  ]
}

For a more gradual introduction, check out the tutorial, or try the interactive demos on the the website.

Why types?

RCL is implemented as a tree-walking interpreter that can dispatch on the values it encounters. It does not have a compiler that needs type information to know what instruction to select, or to emit the correct memory offset for a struct field. So why do we need types at all?

Well, we don’t need types. But I want to have them for two reasons:

It is no surprise that TypeScript has largely displaced JavaScript, and that Mypy has taken the Python world by storm. To keep a large codebase maintainable, you need types.

The snippet in the previous section is pretty clear on its own though. It would be a shame to make it more verbose than necessary, especially for a configuration language that tries to eliminate boilerplate. So type annotations in RCL are optional. The type system is gradual, so you can clarify and enforce types when necessary, but you don’t have to specify them in straightforward code.

Typing json

The purpose of RCL is to output configuration files for other tools. These tools can demand any schema. If RCL placed limitations on that, it would not be a very useful configuration language. This means that the type system must be able to deal with constructs that some type systems would reject, such as heterogeneous lists, or if-else expressions where the then-branch returns a different type than the else-branch.

// What do we need to put on the ? to make the types explicit?
let xs: List[?] = [42, true, "yes"];
let y: ? = if xs.contains(21): "has-21" else null;

With type annotations removed, the above code is well-typed. But that annotations are optional, doesn’t mean that variables don’t have types. So what is the type of xs and y? The way RCL deals with this is through a type lattice, and the inferred type for both question marks is Any — but I’m running ahead, we’ll see more about the type lattice in the next post.

Static vs. runtime

For long-running daemons or programs deployed to users, types are essential for building robust software. Such programs need to be prepared to handle any situation at runtime, because if there is an unhandled runtime error, there is no developer watching to fix the program. A type system can help the programmer to discover and handle edge cases ahead of time.

Configuration languages are on the other end of this spectrum. An RCL program does not need to be able to handle any situation, it needs to handle exactly one. The program doesn’t even have any inputs: all parameters are “hard-coded” into it. The program itself is the configuration file after all.

For an RCL program, there is no “run-time”. A user will run RCL to generate some configuration, and if that succeeds, RCL is out of the picture. Internally it has separate typechecking and evaluation phases, but these run directly after one another, and the user will see static errors and runtime errors at the same moment.

Because of this specialized use case, runtime errors in RCL are not nearly as bad as in some other languages. If we can’t prevent an error statically, we can defer to a runtime check, and the user will still learn about the error at compile time. Runtime errors are static errors in RCL.

What should be well-typed?

The type system is a new addition to RCL. Although it is a goal for RCL to be able to represent any json document, it is not the goal that any expression that could be evaluated prior to the addition of the typechecker, is well-typed. For example, the following program has a type error, even though it can be evaluated with the typechecker disabled:

let ints = [
  for x in [1, 2, 3]:
  if x > 10:
  x
];
[for i in ints: not i]

RCL reports the following error:

  |
6 | [for i in ints: not i]
  |                     ^
Error: Type mismatch. Expected Bool but found Int.

  |
6 | [for i in ints: not i]
  |                 ^~~
Note: Expected Bool because of this operator.

  |
2 |   for x in [1, 2, 3]:
  |             ^
Note: Found Int because of this value.

It is true that not cannot be applied to integers, but that type error is not exposed at runtime because ints is empty, so without the typechecker, evaluation succeeds. I am fine with rejecting pathological code like this. After all, trying to negate an integer is probably a bug, even if the code path is unreachable.

Putting it together

RCL is a new configuration language that aims to reduce configuration boilerplate by extending json into a simple functional language that enables abstraction and reuse. I am adding support for type annotations and a typechecker to it. What do I want from the type system?

In the remainder of this series, we’ll see how RCL achieves this. In part two we will look at the type system, and in part three at some related type systems that inspired RCL. Finally, in part four we will look at the implementation of the typechecker itself. If this post got you interested in RCL, check out the type system documentation, and try RCL in your browser!

More words

A type system for RCL: The type system

I am adding a type system to RCL, my configuration language. In part 2, I explain how the type system works. It is based on lattices and features a generalized subtype check. Read full post