Multiple external dependency closures in Bazel
I frequently see organizations moving to a monorepo, where applications or services depend on different versions of third-party libraries, and face a decision. Should they align these versions, following a "single version policy". Should they allow every application to manage its own separate list of dependencies? Or is there some approach in the middle? What are the tradeoffs between these solutions, and how does Bazel affect the decision?
Here's someone asking this recently, which prompted me to finally post about this: reddit.com/r/bazel/comments/115tqh0/why_the..
I'll give some quick answers here. If you'd like to get detailed answers for your codebase, you can book directly on my calendar: calendly.com/alexeagle
First of all, I cannot recommend this article enough, by my college CS teaching assistant and co-author of the Go language: research.swtch.com/deps
The short takeaway: taking a dependency on an external library seems like a convenient and obvious shortcut for developers ("never try to write your own datetime parsing code") but in practice the hidden, deferred costs mean it's often the wrong choice.
For the rest of this article, let's assume that the external dependencies are legitimately needed and need to be fetched and made available at build-time and/or run-time for the application.
At a high level, all languages look the same:
The developer expresses the dependencies they take and the version constraint, which could be "any version" or "at least this version" or "a version that is semver-compatible like 2.." or sometimes "exactly the following version". That last one is incorrect because it pretends to "pin" the dependency for reproducibility, but you have to pin transitive dependencies too, which leads to:
You run a "constraint solver" to determine a complete "transitive closure" of dependencies, which satisfies all the developer's constraints as well as those of the external libraries. You could do this on-the-fly, but for reproducibility you should write the result as a "lockfile" in the source tree, ideally including integrity hashes of those files to defend against supply-chain attacks.
The dependency lockfile is provided to Bazel. In some cases it's translated to Starlark, so that Bazel's downloader fetches the packages. This allows Bazel's downloader configuration to handle things like providing a read-through proxy, and also lets the Bazel repository cache hold onto these. Some rulesets just rely on the package manager tool to do the downloads instead.
The Bazel rules expose each direct dependency as a "label" so you can include it in the
depsof the code that imports from that dependency, bringing the external libraries into your dependency graph. That graph spans both first-party and third-party libraries, which is where the trouble is going to start.
I'll write about the problem in general, but first I'll translate these for each language I've studied closely.
There are many ways to express your dependencies because the ecosystem has lots of competing standards. Bazel's rules_python prefers the
requirements.txt format for expressing the dependencies and their constraints, and expects a lockfile which is also in that format. It provides a rule to run
pip-compile from pypi.org/project/pip-tools to run the constraint solver and a test that verifies the lockfile is up-to-date. Aspect's rules_py depends on rules_python to do this.
When the third-party dependency is a "source distribution" for the platform/architecture you install on, then
pip install is run as a repository rule in Bazel. This is pretty terrible and ought to be avoided, but the recipe for doing so today is "use binary wheels only" which in practice means you have to supply those yourself. There's good work going on here in the community so better answers may be coming.
pip_parse repository rule (or module extension in bzlmod) converts the locked requirements to BUILD files. It uses the
pip install tool to do the downloads, not the Bazel downloader. Only the packages needed for the requested build outputs are installed, so this is incremental.
You can have multiple
pip_parse calls with different names like
your_pypi_deps within a single Bazel workspace, so it's trivially possible to have multiple dependency "transitive closures".
Everyone has standardized on
package.json to express your dependencies. Each package manager has its own lockfile format. Aspect's rules_js supports the pnpm-lock.yaml file directly, and also allows on-the-fly import of npm or yarn lockfiles.
rules_js uses Bazel's downloader to fetch these packages.
You can have multiple
npm_translate_lock calls with different names like
your_npm_deps within a single Bazel workspace, so it's trivially possible to have multiple dependency "transitive closures".
Node.js is unique among language runtimes in that it supports multiple versions of the same library in a single application. The resolution spec of
require walks up the
node_modules tree starting from the callsite of
require and takes the first result, so that code in two different locations can get different results.
Go introduced a "module" system in version 1.11, which is the dependency manager used under Bazel. This expects a
go.mod file in the root of a Go module. A
go.sum file provides the lockfile, however unlike other rulesets, rules_go doesn't read
go.sum when installing dependencies. Instead, typical usage runs the
update-deps command from [Gazelle] which independently solves the version constraints and writes the result as
go_repository calls in Starlark, typically into a macro living in
go.bzl, forming a triple (
Go has different semantic versioning than other languages: a major (v2.0) will have a different module name (my.com/module/v2) so it's easier to have a single version policy: if different applications use different major versions of the same library, those can live side-by-side since they're seen as two different modules.
Gazelle update-repos does allow multiple transitive closures to be installed.
OCI (Open Container Initiative)
rules_oci is an alternative to rules_docker. OCI (and docker) already use a content-address based scheme using digests to refer to remote images, so a lockfile isn't required. rules_oci will warn you if you use a tag like
latest to refer to your dependencies. It uses Bazel's downloader to fetch manifests and layers for your base image.
This ruleset is still pre-1.0 so I won't go into much detail yet, as things are subject to change.
Here we just model the many-repo world, where each application has its own set of dependencies. As one case study, a large finance company I worked with has about 80 different requirements.txt files and transitive dependency closures.
- Skew is the downside of this approach. A given external dependency at different versions will likely be reachable following multiple dependency paths, and it's hard to predict which version you end up with. rules_python, for example, constructs a
sys.pathin the runtime stub with dependencies in an arbitrary order, and the interpreter will end up with whichever one happens to be first. This can easily violate dependency constraints - you use library X@1 which needs Y>=2, but you have Y@1 picked up first, probably making library X misbehave or crash. In practice, it will often work out okay, but when it doesn't, you'll spend a long time figuring out why.
- Management is harder. You'll have many dependency files and lockfiles, many calls to the Bazel repository rule to translate them to starlark, and many external
@path_to_myapp_depsrepositories to depend on. Tools like Gazelle may not understand which transitive dependency closure should be added to
depsto satisfy an
- Speed of migration is an advantage here. You can reduce the effort required to migrate to a Bazel monorepo.
This approach to external dependencies is based on a philosophy that there should be a single transitive closure of external dependencies for the entire Bazel workspace. The monorepo governance group (
CODEOWNERS of the root folder) are responsible for making dependencies work. This is how Google does things.
In practice, there is usually some need for an exception for "big breaking changes" where a second version has to be made available during a migration window. Applications are switched over one-at-a-time, then finally the old version is removed.
- Updates are a big deal, since changing an external dependency version will immediately make all applications in the workspace pick up the change. This can only work when applications have decent automated test coverage. This is a cultural change from multi-repo, because the engineer who does the upgrade is now responsible for any fixes needed across the whole workspace. At the organization-level, this is a feature: you get a better economy of scale if only one engineer needs to learn the details of the upgrade, and other teams get the benefits for free. At the team-level this is a bug, because it takes longer for this engineer to do the upgrade than it would have in a multi-repo.
- Aligning dependencies is a first migration step. We've written one-off tools to walk a multi-repo and find the greatest-common version (Go uses MVS and you can take a similar approach). Then you do pre-factoring steps to change application dependency versions to match the single version policy and roll out the application. If the change sticks, then you've reduced the mismatch. When the mismatch goes to zero, you can drop the separate transitive dependency closure for that application.
- Solving constraints gets harder. In a huge repository, in theory it's not possible to have a single version file that includes many external libraries because they don't have any version of a common dependency that satisfies both. In practice, we've always found that it's possible in a medium-sized repository to wiggle the situation loose, though sometimes it does require getting a fix upstream in some library to relax their constraint. (Like when they have meaningless upper-bound constraints)
A couple versions policy
This is a useful middle ground. There is still a governance group preventing divergence ("you cannot make a new
requirements.txt file until you make your case to us of why you need it").
- Disjoint dependency graphs: this approach works best when there are truly disjoint graphs, meaning that applications in subgraph A don't depend on any of the libraries in subgraph B, and they don't share any external dependencies either. In this case you won't run into any of the version skew bugs from the "many-versions policy".
- External hosted runtimes can force you into this approach. For example you may deploy code to an external service like a Cloud Lambda or a Snowflake data warehouse. They may constrain the language version or version of a library that you must use. This extra constraint can make your "single version" policy unsolvable.