A type system for RCL

Part IV

Implementing a typechecker in Rust

written by Ruud van Asseldonk
published 21 July 2024

I am building a new configuration language: RCL. It extends json into a simple functional language that enables abstraction and reuse. In this series we take a closer look at the type system, because I think it is interesting, but also because I hope that exposure to the wider PL community can help me to validate some of the ideas, or find problems with them.

In part one we looked at what I want from a type system for RCL, and in part two we saw what the resulting gradual type system looks like. The types form a lattice that is useful for inference, and typechecking involves a generalized form of checking subtype relationships. In this part we’ll take a closer look at how the typechecker is implemented in Rust. After this post, we will understand how RCL identifies the type error in the program below, and how it is able to localize and explain the error:

let ports = [22, 80, 443];

// Use string keys so we can export as json.
let firewall_rules: Dict[String, String] = {
  for port in ports:
  port: "allow"
};

  |
6 |   port: "allow"
  |   ^~~~
Error: Type mismatch. Expected String but found Int.

  |
4 | let firewall_rules: Dict[String, String] = {
  |                          ^~~~~~
Note: Expected String because of this annotation.

  |
1 | let ports = [22, 80, 443];
  |              ^~
Note: Found Int because of this value.

(One way to fix the error is to convert the integer to a string using string interpolation, by writing f"{port}": "allow" on line 6.)

Failed attempts

Elegant ideas appear obvious in hindsight, but we don’t often talk about the dead ends we had to explore first. The current type system with the generalized subtype check is my third attempt at writing a typechecker for RCL.

Sometimes you have a piece of code. It’s not pretty, but it works. Something feels off about it. You can’t quite put your finger on it, but it’s not something that a superficial refactor could fix. With every new feature the code grows a little more complex and intractable, but that’s just how software matures, right? And then one day — and this has happened to me only a handful of times in my career — you discover the right way of doing things. Suddenly everything becomes simple and elegant, and the features fit in naturally without complicating each other. The generalized subtype check was one of those cases.

I started with a typechecker that could only report a binary “well-typed” vs. “type error”. To handle gradual typing it had Type::Dynamic, but it was a special case everywhere, and the typechecker itself dealt with inserting runtime checks where needed. This worked but it became a mess.

For the second attempt, I tried to represent type expectations differently from inferred types, in order to generate better error messages. In hindsight, checking that the inferred type met the expectation was a subtype check, but I didn’t see it that clearly at first. I got pretty far, it even worked with covariant types like List … and then it completely failed when I got to function types, which are contravariant in their arguments.

At some point after going through those iterations, it occurred to me to view types as sets of values. The subset ordering on sets is the subtype relationship on types. For typechecking, if the actual type is a subtype of the expected type, that is well-typed. If the types are disjoint, that’s a static error. If the types overlap without being subtypes, static typechecking is inconclusive, and we defer to a runtime check. With these three possible outcomes, everything fell into place, and the implementation became a lot more elegant.

The evaluation pipeline

RCL evaluates a document in several stages:

The lexer produces a sequence of tokens.
The parser turns those into a concrete syntax tree (CST). This tree preserves all syntactic constructs, including comments, blank lines, underscores in integer literals, etc. This tree is not only used for the next evaluation stage, it is also used by the autoformatter.
The abstractor turns the CST into an abstract syntax tree (AST). The AST drops comments and normalizes constructs. For instance, it has only one kind of integer literal, whereas the CST distinguishes between binary, decimal, and hexadecimal integer literals.
The typechecker walks the AST, inferring and checking types on the fly, and inserting nodes for runtime checks where needed. Unlike more advanced type systems, RCL’s typechecker performs a single pass; it does not accumulate constraints that need to be solved later. This limits the type system in some ways, but it also keeps the typechecker fast, and it makes type errors easier to explain. The typechecker does not store most of the intermediate inferred types, they only exist for the sake of typechecking.
The evaluator walks the AST again and evaluates the document into a value. Where the evaluator had to emit runtime type errors before the addition of the typechecker, now in many cases those branches are simply unreachable!, and continuous fuzzing helps to build confidence that they really are.
Finally, the pretty-printer formats the value into an output to display.

In the remainder of this post, we’ll take a closer look at step 4.

The typechecker

The main workhorse of the typechecker is the check_expr method. It implements both type inference and type checking:

struct TypeChecker {
    // Lifetimes simplified for the sake of this post.
    env: Env<SourcedType>,
}

impl TypeChecker {
    pub fn check_expr(
        &mut self,
        expected: &SourcedType,
        expr_span: Span,
        expr: &mut Expr,
    ) -> Result<SourcedType> {
        // We'll see some of the implementation below.
    }
}

The TypeChecker struct itself only carries an Env, the environment that contains the type for every name in scope. To check an expression, we pass in:

The expected type. At the top level this is Type::Any. In most cases the expected type is passed down, but an expectation can also come from an annotation, and in some cases a syntactic construct prescribes the type. For example, the condition in an if-else expression must be a boolean.
The span (source location) of the expression. This is used to highlight the offending code in case of a type error.
The expression itself. It is mutable, because the typechecker may wrap the expression in a runtime type check.

The method returns the inferred type on success, or an error in case of a static type error. The implementation is a big match statement. For example, this is the check for an if-else expression:

let expr_type = match expr {
    Expr::IfThenElse {
        condition_span,
        condition,
        body_then,
        body_else,
        span_then,
        span_else,
        ..
    } => {
        self.check_expr(type_bool_condition(), *condition_span, condition)?;

        let type_then = self.check_expr(expected, *span_then, body_then)?;
        let type_else = self.check_expr(expected, *span_else, body_else)?;

        // The inferred type is the join of the two sides, which may be
        // more specific than the requirement (which they both satisfy).
        Typed::Type(type_then.join(&type_else))
    }

    // [Other match arms omitted in this snippet.]
};

At the end, the typechecker inserts a runtime check if needed:

match expr_type {
    // If the type check passed, great, we now know the inferred type.
    Typed::Type(t) => Ok(t),

    // If we couldn't check statically, then we have to insert a runtime
    // type check around this node. We have to sacrifice a temporary
    // NullLit to the borrow checker to swap the node into place.
    Typed::Defer(t) => {
        let mut tmp = Expr::NullLit;
        std::mem::swap(&mut tmp, expr);
        *expr = Expr::CheckType {
            span: expr_span,
            type_: expected.clone(),
            body: Box::new(tmp),
        };
        Ok(t)
    }
}

The Typed enum helps to distinguish between the two cases that the subtype check can run into. Type(T) means that we know statically that the type is T; Defer(T) means that we need to insert a runtime check, and if the check passes, then we have a value of type T.

Rust is a great language for implementing a typechecker like this, because it combines the best aspects of a functional language with low-level control over performance. Like Haskell it has algebraic data types and pattern matching, which make inspecting the AST safe and easy, but it also enables us to mutate the AST in place to wrap nodes, unlike Haskell where we would need to copy most of the AST to wrap an inner node.

The second workhorse of the typechecker is is_subtype_of. It is used for example when checking literals or variable lookups:

let expr_type = match expr {
    Expr::NullLit => {
        type_literal(expr_span, Type::Null)
            .is_subtype_of(expected)
            .check(expr_span)?
    }
    Expr::StringLit(..) => {
        type_literal(expr_span, Type::String)
            .is_subtype_of(expected)
            .check(expr_span)?
    }
    Expr::Var { span, ident } => match self.env.lookup(ident) {
        None => return span.error("Unknown variable.").err(),
        Some(t) => t.is_subtype_of(expected).check(*span)?,
    }
    // [Other match arms omitted in this snippet.]
};

The is_subtype_of method implements the subtype relationship implied by the type lattice:

impl SourcedType {
    pub fn is_subtype_of(
        &self,
        other: &SourcedType,
    ) -> TypeDiff<SourcedType> {
        match (&self.type_, &other.type_) {
            // Void is a subtype of everything, Any a supertype
            // of everything. They are the bottom and top of the
            // lattice.
            (Type::Void, _) => TypeDiff::Ok(self.clone()),
            (_, Type::Any) => TypeDiff::Ok(self.clone()),

            // [Some match arms omitted in this snippet.]

            // If we take an arbitrary value, is it a member of some
            // type T, when T is not `Any` (that case is already
            // covered above)? We don't know statically.
            (Type::Any, _) => TypeDiff::Defer(other.clone()),

            // [More match arms omitted here.]

            // If we have any other combination of types,
            // they are incompatible.
            _ => TypeDiff::Error(Mismatch::Atom {
                actual: self.clone(),
                expected: other.clone(),
            }),
        }
    }
}

Because the check is a generalized check, it returns one of three cases, represented by TypeDiff. A type diff is a precursor to Result<Typed>, and can hold a structured diff in case of an error.

/// The result of a subtype check `T ≤ U`
/// where `U` is expected and `T` encountered.
pub enum TypeDiff<T> {
    /// Yes, `T ≤ U`, and here is `T`.
    Ok(T),

    /// For `t: T`, we *might* have `t: U`.
    /// Here is `V`, the type of the value if the check passes,
    /// where `V ≤ T` and `V ≤ U`.
    Defer(T),

    /// For all `t: T`, we have that `t` is not a value of `U`.
    ///
    /// Or, in some cases this is not strictly true, but we want to
    /// rule out that case because it makes more sense. For example,
    /// we say that `List[Int]` and `List[String]` are incompatible,
    /// even though `[]` inhabits both.
    Error(Mismatch),
}

The check method on TypeDiff turns a TypeDiff into Result<Typed>. It formats the structured Mismatch into a printable error.

Sourced types

You might have noticed in the snippets above, most places use SourcedType rather than Type. Sourced types are part of the error reporting machinery. A sourced type is a pair of a type and its source:

pub struct SourcedType {
    pub type_: Type,
    pub source: Source,
}

A source describes where the type came from. It contains cases like these:

pub enum Source {
    /// There is no source for this type.
    ///
    /// This is the case for `Type::Any` when it is introduced by the
    /// typechecker for expressions that are not otherwise constrained.
    None,

    /// The type was inferred from a literal.
    Literal(Span),

    /// It was a type annotation in the source code.
    Annotation(Span),

    /// A boolean is required, because it's used as a condition
    /// in `if` or `assert`.
    Condition,

    /// The type is the required type of the operator at the span.
    Operator(Span),

    /// An integer is required due to indexing into a list.
    IndexList,
}

Like types, Source forms a lattice, which in turn makes SourcedType a lattice. When we join types for the sake of inference, we also join their sources. This is how for the example in the introduction, the inferred type List[Int] for ports can keep the Int pointing at one of the integer literals, so it can explain in the error why port has type Int. Let’s look at that example again:

let ports = [22, 80, 443];

// Use string keys so we can export as json.
let firewall_rules: Dict[String, String] = {
  for port in ports:
  port: "allow"
};

  |
6 |   port: "allow"
  |   ^~~~
Error: Type mismatch. Expected String but found Int.

  |
4 | let firewall_rules: Dict[String, String] = {
  |                          ^~~~~~
Note: Expected String because of this annotation.

  |
1 | let ports = [22, 80, 443];
  |              ^~
Note: Found Int because of this value.

Notice something else? The error points at the dict key ports specifically. We can do this because typechecking and inference are fused and top-down. When the typechecker enters the dict comprehension, it already has an expected type Dict[String, String]. The dict comprehension satisfies the Dict part, so inside the comprehension, we have an expected type for the key and value (both String).

Compare this to an approach where we would infer types bottom-up, and check compatibility afterwards. Then the typechecker would infer Dict[Int, String] for the dict comprehension, and it would have to report something like this:

  |
4 | let firewall_rules: Dict[String, String] = {
  |                                            ^
Error: Expected Dict[String, String] but found Dict[Int, String].

Now it’s up to the user to diff those types in their head, and then search the source code for where the violation happens. Pushing the expectation in top-down enables friendlier errors. This example may be a bit contrived, but it will make a big difference for record types, where types can grow big.

Incremental implementation

Having an Any type makes implementing type inference very forgiving. If there is a case where things get too complex, we can just return Any. For example, because record types are not yet implemented, all field lookups currently infer as Any. The more precise the inferred type, the more errors we can catch statically, but because we also have a runtime check, returning Any is not incorrect.

There is one more place where we can cut some corners: in the join implementation. Mathematically join should return the least upper bound, but in practice it’s fine if we return some upper bound. For example, join does not produce Union if none of the inputs are:

// If we annotate as List[Union[Int, Bool]] it typechecks,
// but without annotation the inferred type is List[Any].
let xs = [42, true];

This is partially laziness on my end, but also a matter of keeping the typechecker efficient, and keeping type errors tractable. If we inferred union types, type errors would quickly grow enormous.

Future work

Although the type system works well, it is not yet useful for large configurations in the real world. Three major features are needed for that:

Record types. By now I don’t expect they are hard to implement (function types already carry named arguments, which are similar), I just need to sit down and do it.
Type aliases. With record types, you don’t want to spell out the full type everywhere; you write it down once and then refer to it. I have a rough idea of what I want, but there are some unresolved questions. How should scoping work? The same as for expressions? Does that mean you can have local type definitions? Or should types only be allowed at the top level? Do they need to go before all expressions then, or can you mix let bindings and type definitions? Etc. I think this will be a matter of implementing something, getting a feel for it, and then iterating.
Importing types across files. Because every RCL document contains a single expression, it is very clear what importing files means on the value level. For types, I don’t think it is feasible to limit them to one exported type per file. The main blocker here is coming up with a syntax that is both legible and not completely hideous.

Aside from those, there are less important features that I want to add:

String literal types, so you can use a union of string literals to model enums.
Quantification and type variables, to be able to give a more accurate type to functions such as filter. Currently it is documented with a type variable, but in the implementation all type variables are just Any, so the typechecker loses track of the list element type after passing through filter.

Furthermore some parts are just not implemented, like method lookups. Fortunately the type system implementation is forgiving, and RCL is usable and useful right now without them.

Conclusion

RCL is a new configuration language that aims to reduce configuration boilerplate by extending json into a simple functional language that enables abstraction and reuse. I am adding support for type annotations and a typechecker to it, and in this post we looked at how that typechecker is implemented. We saw how the fused typecheck and inference, check_expr, is the cornerstone of the typechecker, with is_subtype_of implementing the generalized subtype check that defers some checks to runtime. On the more practical side, passing down an expected type enables localizing type errors accurately, and tracking the sources of types helps to explain type errors.

This post concludes my series about RCL’s type system. One of RCL’s goals is to be obvious and easy to use for people who have some background in mainstream languages like Python or TypeScript, but not necessarily typed functional programming. As such, the type system is relatively simple, and mostly putting together existing ideas. The main idea that I haven’t seen implemented elsewhere is the generalized subtype check. And to be fair, it is of limited use — the reason RCL can get away with it, is that runtime errors are static errors in a configuration language. I would love to hear from people more versed in the literature if something like this exists in other systems, and if there are common practices around it.

If this series got you interested in RCL, try RCL in your browser, and check out the type system documentation! I don’t recommend relying on RCL to generate production configuration yet: it is a hobby project without stability promise. However, I do use it almost daily as a jq replacement, and the new map and filter methods in v0.4.0 make it even nicer for that, so give it a try!