A reasonable configuration language

written by Ruud van Asseldonk
published

About six months ago, I was fed up with it. The particular it was HCL — Hashicorp Configuration Language — but that was just the trigger, it was hardly the only offender. The issue I was struggling with that day was to define six cloud storage buckets in Terraform. They were similar, but not quite identical. The kind of thing you’d do with a two-line nested loop in any general-purpose language, but where all the ways of achieving that in HCL were so much hassle, that is was far simpler to just copy-paste the config six times.

Although this HCL episode was the droplet, my bucket of frustration had been filling up for a long time:

So that day, when I was in a particularly defiant mood, I decided to write my own configuration language. With list comprehensions. And types.

I’ll build my own configuration language. With list comprehensions. And types.

I never expected or intended for it to go anywhere — it was just a way to vent. But six months later, Ruud’s Configuration Language is no longer completely vaporware. I find it increasingly useful, and I think it might benefit others too. So let’s dive in!

A functional foundation

To be clear, I’m not criticizing the designers of Ansible or HCL. The limits of these tools are a natural consequence of their organic growth: you start out with a tool that needs simple configuration, adoption grows and people start doing more complex things with it, and suddenly you find yourself without a good way to do abstraction. So as a quick stopgap, you bolt on control flow encoded inside the data format, because that’s easy to do within the limits of the existing syntax.

When it comes to adding more principled abstraction features, the authors have a background in infrastructure administration, not in language design or type theory. So they accidentally implement some functions in an ad-hoc way that seemed helpful, but causes surprises down the line. (A flatten that sometimes flattens recursively can’t be typed properly, which breaks generic code.) Many of JavaScript and PHP’s idiosyncrasies can be explained in the same way.

The Nix language had a more solid foundation in functional programming from the start, which enables abstraction in a natural way. Even though it predates Terraform by more than a decade, the language has stood the test of time far better than HCL. With very few changes, it scaled to massive configuration repositories like Nixpkgs, and although Nix has issues, abstracting repetition away is not one of them. I’ve used Nix to generate repetitive GitHub Actions workflows, and of course it is at the heart of NixOS, where it generates configuration files such as systemd units from a consistent declarative specification. This is the power of having few simple features that compose well.

Though Nix is great, I don’t think it is the answer to all configuration problems. Nix-the-language is intimately tied to Nix-the-package-manager and the Nix store, and the ML-style syntax can look foreign to people who are used to more mainstream languages. Still, Nix has many good ideas that have been proven to work, and my own configuration language is heavily inspired by it.

The other language that I take a lot of inspiration from is Python. Python is not primarily a functional language, but you can certainly use it in that way (avoid mutation, write pure functions, prefer list comprehensions over loops, etc.), and this is very natural. I find the syntax pleasant and readable: the meaning of idiomatic Python code is clear even to people who are not intimately familiar with the language. As a configuration language, Python is not bad! In fact, I’ve also used Python to generate repetitive GitHub Actions configurations. List and dict literals are very similar to json, and with functions, list comprehensions, and format strings, there is ample room to abstract repetitive configuration. Types can help to document and enforce structure.

But like Nix, I don’t think that Python is the answer to all configuration problems. A Python file is still primarily code, not data. You can have an entry point json.dump data to files or stdout, but it’s not always easy to import or evaluate intermediate pieces in isolation. Python’s module system is great for larger codebases, but less suitable for sharing pieces of data between many small scripts.

For my own language, I took the parts that I like about Nix: functional, more data than code, but with enough room to code when needed, and simple features that compose well. I took what I like about Python: the clean and familiar syntax, list comprehensions, format strings, and types. And consciously or unconsciously, I’m influenced by many more languages that I’ve been exposed to. Those ideas I combined into a language that I like working with.

Oh no, yet another configuration language!

I am not the first person to be frustrated by the lack of abstraction features in various tools, nor am I the first person to think that a configuration language would solve that. There exist more configuration languages than I can count on one hand already (see the appendix), and probably many more that I’m not aware of. So why add one more to the mix? Why is this one going to really solve all our problems, when five more mature ones haven’t seen widespread adoption (yet)?

First of all, I did not start out writing my own language thinking it would be a viable alternative to existing configuration languages. I started it to vent, because I find it fun to work on, because it’s a good learning exercise, and because I can do things in exactly the way that I want to. Dhall has been around longer, has wider support, and a bigger community. But I don’t really like the syntax and the way it names some things. That’s a superficial complaint, and if I was looking for a tool to solve my configuration problems with the least amount of effort, then I can set my taste aside — I’ll get used to it. But for a personal project that I spend my free time on, I enjoy exploring ideas and building exactly the tool that I want to have.

So that’s how it started, as a toy project. I put a big vaporware warning on it, expecting that I would lose interest in it before it got to a point where it was useful. It’s certainly not the first time that I’m writing a toy language that stalled, and maybe this one will meet the same fate. (I do still occasionally use Pris, and occasionally I get excited about adding features, but it’s mostly abandoned, like many of my side projects.) But then my tool started being useful. First in unexpected places (as a jq replacement, more on that below), and as I added features, in more places, to the point where now — despite its shortcomings — I would prefer it over some of the tools that I use at my day job.

So now what, is it a Serious Software Project now? No, it’s still a hobby project without stability promise. I don’t recommend using it for anything serious. But it’s also useful to the point where I expect I’ll keep it in my toolbelt for the forseeable future — if only as a jq replacement. And if it’s useful to me, maybe it’s useful to others, so that’s why I’m writing about it today.

Ruud’s Configuration Language

So what is this language? I call it RCL, named after myself in Bender meme style, but it turns out that rcl is a pretty good file extension and name for a command-line tool. If you prefer, it might stand for Reasonable Configuration Language, or, in classic GNU style, for RCL configuration language.

The language is a superset of json. This makes it easy to export data from many tools and incrementally upgrade it to RCL, including from yaml: just serialize it to json, and you’re good to go. This is a valid RCL document:

{
  "buckets": [
    {
      "name": "bucket-0",
      "location": "eu-west1",
      "delete-after-seconds": 86400
    },
    {
      "name": "bucket-1",
      "location": "eu-west1",
      "delete-after-seconds": 86400
    }
  ]
}

It’s 2024, so RCL has some features that you might expect from a “modern” language: trailing commas and numeric underscores. Furthermore, dicts can be written with ident = value syntax to omit the quotes and reduce some line noise:

{
  buckets = [
    {
      name = "bucket-0",
      location = "eu-west1",
      delete-after-seconds = 86_400,
    },
    {
      name = "bucket-1",
      location = "eu-west1",
      delete-after-seconds = 86_400,
    },
  ],
}

There are arithmetic expressions as you would expect, list comprehensions, format strings, and functions:

{
  buckets = [
    for i in std.range(0, 2):
    {
      name = f"bucket-{i}",
      location = "eu-west1",
      delete-after-seconds = 24 * 3600,
    },
  ],
}

For validation, the key_by method is useful. In the above example, if we’d name the buckets by hand and there are many of them, how do we ensure that we don’t accidentally create two buckets with the same name? We can do that by building a mapping from name to bucket:

let buckets = [
  // Omitted here for brevity, defined as before.
];

// Build a mapping of bucket name to bucket. If a key (bucket name)
// occurs multiple times, this will fail with an error that reports
// the offending key and the associated values. The type annotation
// is for clarification, it is not mandatory.
let buckets_by_name: Dict[String, Dynamic] = buckets.key_by(b => b.name);

// Constructing the mapping is enough for validation, the document still
// evaluates to the same dict as before. Note, the left "buckets" is the
// name of the field, the right "buckets" is a variable reference.
{ buckets = buckets }

This is just a quick overview of some features. For a more thorough introduction, check out the tutorial and the syntax guide.

An RCL document is always an expression, and you can evaluate it to a json document with the rcl command-line tool:

rcl evaluate --output=json buckets.rcl

The tool can also output in RCL syntax, which is a bit less noisy when inspecting data, and it’s a way to upgrade json documents to RCL. Aside from the standalone command-line tool, I also recently added a Python module that enables importing RCL documents in much the same way as json.loads.

Abstraction in a single document is nice, but the real power comes from imports. These allow you to break down configuration into small reusable pieces. Let’s say that all your cloud resources are in the same location. Then we might have a file cloud_config.rcl:

{
  default_location = "eu-west1",
}

Then in buckets.rcl, we can use that like so:

let cloud_config = import "cloud_config.rcl";
{
  buckets = [
    for i in std.range(0, 2):
    {
      name = f"bucket-{i}",
      location = cloud_config.default_location,
      delete-after-seconds = 24 * 3600,
    },
  ],
}

Because every document is an expression, you can always evaluate it and inspect it, even if it’s only an intermediate stage in a larger configuration. For more fine-grained inspection there is trace, and with rcl query you can evaluate an expression against a document to drill down into it. For example, to look only at the first bucket:

rcl query buckets.rcl 'input.buckets[0]'

This feature is what made RCL useful for a use case that I did not anticipate: querying json documents.

An unexpected jq replacement

I use jq a lot. Most of the time, only to pretty-print a json document returned from some API. Because RCL is a superset of json, rcl can do that too now:

curl --silent https://api.example.com | rcl evaluate

By itself that is nothing special, the true power comes when querying. Jq features its own stream processing DSL, and for simple expressions I can usually remember the syntax — unpack the list, extract a few fields. But when it gets more complex, I’m at a loss. A while while ago, I was dealing with a json document that had roughly this structure:

[
  { "name": "server-1", "tags": ["amd", "fast"] },
  { "name": "server-2", "tags": ["intel", "slow"] },
  { "name": "server-3" },
  { "name": "server-4", "tags": ["amd", "vm", "slow"] }
]

I wanted to know the names of all the machines that had a particular tag applied. That the tags field is missing from some machines complicates that, and the real input consisted of hundreds of machines, so fixing that by hand was not feasible. I spent about 10 minutes struggling with jq and scrolling through unhelpful Stack Overflow answers. I did not think to try ChatGPT at the time, but in hindsight it almost gets the query right to a point where I could then get it working myself. But fundamentally, these kind of queries come up so infrequently that the things I learn about jq never really stick. ChatGPT is no excuse to tolerate bad tools: if the one-liner is easy to write, that’s still faster than leaving your terminal. At that point I remembered: I have a language in which this query is straightforward to express, and it can import json!

$ rcl query --output=raw machines.json '[
  for m in input:
  if m.get("tags", []).contains("amd"):
  m.name
]'
server-1
server-4

That’s how RCL, even though it is intended as a configuration language, became one of my most frequently used query languages.

The future of RCL

That day when I was fed up with HCL and I ran git init, I didn’t expect to produce anything useful aside from entertaining myself for a few evenings. Now six months later, RCL is no longer vaporware, and it regularly solves real problems for me!

Some parts of RCL are already quite polished. It has mostly good error reporting, there is reference documentation, it has an autoformatter, and it is very well tested with a suite of golden tests and fuzzers. Although I’m not sure at what point it starts being worth the complication of an additional tool, RCL can define cloud storage buckets today with Terraform’s json syntax. But RCL is also far from ready for prime time: there is no syntax highlighting for any editor aside from Vim, the type system is a work in progress, it doesn’t support floats yet, the Python module doesn’t expose errors nicely, the autoformatter has quirks, and I’m still ambivalent about whether there should be a : after else.

But most of all, I’m not sure whether I want RCL to experience prime time. Of course it is very gratifying to see your project be adopted and solve real-world problems for other people. I’m proud of what I built so far and I want people to see it and try it — that’s why I publish everything as free and open source software, and that’s why I’m writing this post. It always cheers me up when somebody who found one of my projects useful or interesting sends me an e-mail. But I also already experience a bit of maintainer fatigue from some of my successful Rust crates, and I don’t always spend the time on them that they deserve. When a project takes off, inevitably users start making requests, having opinions, and submitting well-intentioned but low-quality contributions. Keeping up with that takes time and mental energy. I like working on RCL right now, because I get to build it in exactly the way I want, and it solves exactly the problems that I have. Building a tool for the open source community would require making different trade-offs. For now, I’m treating it as a source-available project. It solves a need for me, and if others find it useful that’s great, but it is provided as-is. Maybe Haskell’s avoid success at all cost isn’t such a bad idea.

Appendix: A non-exhaustive list of configuration languages

Aside from Nix, Python, and HCL, which I’ve already discussed extensively, I am aware of the following configuation languages. For the ones that I’ve used or at least evaluated briefly, I added my personal impressions, but beware that these are very superficial.

Bicep — Microsoft’s DSL for configuring Azure resources declaratively. I haven’t looked into it in much detail because I don’t work with Azure, but it looks potentially interesting.

Cue — Out of all the configuration languages that I evaluated during a company hackathon, I found Cue to be the most promising one. Its type system is interesting: it helps to constrain and validate configuration (as you would expect from a type system), but it also plays a role in eliminating boilerplate. Like Nix, Cue is based on few simple constructs that compose well, and grounded in solid theory. It took me some time before it clicked, but when it did, Cue became really powerful. A few things I don’t like about it are the package/module system that has its roots in the Go ecosystem, and its string interpolation syntax which is hideous. The command-line tooling works but could be more polished, and I found it to become slow quickly, even for fairly small configurations. It has a page comparing itself against a few other configuation languages.

Dhall — This is the first configuration language that I learned about many years ago. From what I can tell, it is one of the most mature and widely supported configuration languages. I use Spago, the PureScript package manager, in some of my projects, and it uses Dhall as its configuration format. Unfortunately it looks like it is being deprecated in favor of yaml. I tried to use Dhall once to solve an Advent of Code challenge, but got stuck immediately because it’s not possible to split strings in Dhall. Of course, this is an unfair test to evaluate a configuration language on, but it does give an impression of the expressivity of a language. I’ve used Nix to solve a few Advent of Code challenges in the past, and this year I solved a few in RCL, which went pretty well for small inputs, but the lack of unbounded loops and tail calls make it unsuitable as a general-purpose language. Although I used to work as a Haskell developer, the formatting and names of built-in functions in Dhall look awkward to me.

JSON-e — A json parametrization language. I discovered this one in Rimu’s list of related projects. I think I’ve seen it mentioned a few times before, but I haven’t evaluated it at all.

Jsonnet — I never properly evaluated Jsonnet, but probably I should. Superficially it looks like one of the more mature formats, and in many ways it looks similar to RCL. Its has a page comparing itself against other configuration languages.

KCL — This is an odd one. From the website and repository it looks like a lot of resources went into this project, but somehow I’ve never seen it come up or be used anywhere. I only learned about it when I started searching for configuration languages. From the way it describes itself, it sounds like the tool I want, but I am generally wary of tools that use lots of buzzwords, especially when it involves the words “modern” and “cloud native”. I should evaluate it properly at some point. It has a page comparing itself against other configuation languages.

NickelAn attempt to create a language similar to Nix, but without being tied to the package manager and Nix store. It looked very promising to me, but after evaluating it during a company hackathon, I found it difficult or impossible to express sanity checks that I can easily express in Cue and RCL. Its has a page comparing itself against other configuation languages.

Pkl — A configuration language by Apple. The timing is eerie: I wrote this post on a Saturday with the intention of proofreading and publishing it the next day, and right that Sunday morning, the Pkl announcement post was on the Hacker News frontpage. From the comments, it has been in use at Apple internally for a few years already. I haven’t had the opportunity to evaluate it yet. Its has a page comparing itself against other configuation languages, but only superficially.

Pulumi — Not a configuration language, but an infrastructure automation tool like Terraform. It can be configured using existing general-purpose programming languages. I haven’t had the opportunity to try it, but I suppose I don’t get to complain about HCL without at least acknowledging Pulumi’s existence.

Rimu — I stumbled upon this one recently while browsing the configuration-language tag on GitHub. It might be an eerie case of parallel discovery: like RCL, it looks like a configuration language developed as a side project, written in Rust, started in August 2023, and not ready for serious use yet. Unlike RCL, its syntax is based on yaml.

Starlark — A Python dialect used by the Bazel build tool. I used it intensively when I was working with Blaze/Bazel, and it works well for defining build targets. Starlark has multiple implementations, including one in Rust that can be used as a standalone command-line tool, but all the implementations clearly focus on being embedded. From my limited attempts to use them in an infrastructure-as-code repository, they are not suitable for incremental adoption there.

TypeScript — Not a configuration language, but it deserves a mention here, because RCL intends to be json with abstraction and types, and since TypeScript is a superset of JavaScript, which is a superset of json, it falls in the same category of tools that can type and abstract json. I haven’t used TypeScript enough to have a strong opinion on its type system. Possibly RCL’s type system will end up being similar.

More words

AI alignment starter pack

In this post I share resources for learning more about AI alignment, and why misalignment is a risk that I take seriously. Read full post