Releasing Bazel rulesets that publish tools

This post is a bit of a behind-the-scenes look at how we make Aspect's Bazel rules great. Unless you're a Bazel rule author and distribute your rules to third-parties, there's probably nothing for you to take away here. But if you're a fan of the thoughtfulness and polish we put into our developer experience, soak it up!

Here's a quick problem statement:

To make it trivial to perform good maintenance, releases must be fully automated, just by pushing a tag to the repository. (This is also for supply-chain security: we guarantee that the release is made from the sources which were tagged.)
Where possible, our rules just spawn actions that call existing third-party tools we wrap. But sometimes we're forced to spawn a program that we wrote, and thus need to distribute.
We don't want to leak our toolchain dependency to users. It's never as easy as it sounds to force your users to compile your C++ or Go code. Even tools written in Python incur a penalty, such as rules_pkg requiring a python interpreter installed on the machine. Users shouldn't wait to build our tools (looking at you, protoc).
We want to make it simple for users to switch between a tagged release and a git_override or equivalent where they depend on a source archive, based on an un-released SHA of our repo (or their fork of it).

Putting it together, what we need to do is build our binaries and publish them with each release, while also updating our toolchain definition to be able to download them, while ensuring that anyone using a "HEAD" or other source archive of the rules gets the toolchains needed to build the tools.

What's more, it needs to be stupidly simple. We are mostly doing OSS Bazel rules in our spare time: we rarely get paid to work on them. We have over 20 rulesets now. 1/3 of the releases on the Bazel Central Registry are from us [1]. We don't want to spend time performing releases or fixing release machinery.

This puzzled us for a long time, and we've finally found a recipe that works, which I'll share here. I'll use our fantastic Bazel "standard library" as an example: bazel-lib, and step through chronologically what happens.

Before the release

As development is going on in the repository, we always maintain a green CI. This gives us the invariant to make sure we know the integrity hash of our own binaries all the time. This means anytime a developer makes a PR that changes the sources of our tools (written in Go), they'll also be asked to bazel run //tools:releases_versions_check_in so that this file stays current: /tools/integrity.bzl

It's worth the small inconvenience of vendoring this info into the repo. It means we don't have to create any commits on the repo when we perform a release. That's important - once the tag has been pushed, we'd have to add a commit to the repo, and then re-tag (or move an existing tag which is pretty naughty).

We use these integrity hashes to declare our "release" toolchains, such as /lib/private/copy_directory_toolchain.bzl. These are ready to be registered for users to be on the "happy path" of just downloading our release artifacts from GitHub's static content distribution network.

A tag is pushed

/.github/workflows/release.yml shows our GitHub Actions handler that is triggered by pushing a tag. This mechanism comes from https://github.com/bazel-contrib/rules-template which is now the recommended way to start a Bazel ruleset.

These are the steps in the workflow:

Run tests again, just as a check that we didn't tag a red commit. We could look it up from the run on main but thanks to remote caching it's cheap to just run them.
Run the release build. We are paranoid and check that there are no stray changes in the repo.
Run the release_prep.sh script. It has a few jobs:
- The rules are distributed as an archive file, which we build by running git archive. We're careful to produce an archive that's the same structure as what GitHub serves from their source archive endpoint, so that users can switch from releases to source archives with the fewest needed edits.
- We make use of a little-known configuration affordance for git archive in /.gitattributes which lets us substitute a placeholder in the code. It can also exclude some folders that don't need to be shipped. This is WAY simpler than teaching Bazel how to produce our release artifact as a build output, because that requires a tree of filegroup rules that are a PITA to update. (We know, we've done it a lot).
- That placeholder goes into /tools/version.bzl which is how our starlark code will be able to sense whether we are running from a release artifact or a source archive.
- Also produce the release notes, which is the "how to install" snippet that will end up on our user documentation.
Use the https://github.com/softprops/action-gh-release reusable workflow to publish the release to our GitHub repo, including auto-generated release notes and all our artifacts published to be downloaded.
Rely on https://github.com/bazel-contrib/publish-to-bcr to automatically mirror our new release to the Bazel Central Registry at registry.bazel.build

The user runs toolchain resolution

Users are instructed to register toolchains using one of a couple APIs. Here's where we can check whether the VERSION is 0.0.0 which means it's a source archive: /lib/repositories.bzl

If it is, then we register a toolchain that builds the tools from source. If not, then we register a toolchain that downloads the binaries.

Hiding our dependencies

There was one hidden complexity of this plan, in bzlmod. Bazel's MODULE.bazel file is supposed to list our dependencies and differentiates between development-only dependencies, vs. those that should be exposed to users.

Our file /MODULE.bazel has to declare that the go toolchain, and several go libraries, AND gazelle are real dependencies. That's because a source archive will have this file in it, and if you use a source archive, you'll need to be able to build the tools from source.

When our releases get published to the Bazel registry with Publish to BCR, the MODULE.bazel file will be patched, converting those Go dependencies to dev_dependency since users of a release artifact won't need them: /.bcr/patches/go_dev_dep.patch

We do this because we like our users! Even though bzlmod improves the story for Bazel transitive dependencies a bunch, we don't want users to accidentally get the version of these modules bumped due to the MVS algorithm taking our module's dependencies into account, nor do we want them to wait for Bazel to build them from source.

So if you look at our entry on BCR https://registry.bazel.build/modules/aspect_bazel_lib/2.0.0 you'll see we take only three direct dependencies, and the rest are dev dependencies.

Credits

Thanks to Sahin and Derek from Aspect for iterating through our options for making something so elegant!

[1] Clone bazelbuild/bazel-central-registry to see we pushed 213 of the 655 total releases on the registry:
$ for m in $(ls modules/*/*.json); do echo "$m $(jq '.versions | length' < $m)"; done | egrep "aspect|oci|nodejs|structure_test" | awk '{sum += $2 } END { print sum }'