Bazel + TypeScript: faster with Remote Execution

This post will show how much faster TypeScript builds can be when using remote execution, Bazel's unique ability to parallelize transpile and type-check work across a farm of machines. We hope that Bazel 6.0 will include fixes for symlinks support, making it possible to use remote execution with Aspect's rules_ts.

How much faster?

Using a remote execution cluster of 100 executors provided by our partners at EngFlow, we benchmarked a large TypeScript application with 10M lines of code, representing a large-scale enterprise application, to be 8.4x faster with remote execution than when building locally on a 16 core MacBook Pro. The build took 2 minutes and 13 seconds with remote execution vs. 18 minutes 53 seconds locally. Full benchmark results are found further down in this post.

EngFlow provides Remote Execution as a service and comes with a Build and Test UI that provides valuable insights into your build.

Scale Horizontally

With remote execution your build is no longer bound by the resources on your local or CI machines making it easy to horizontally scale your build compute. You can now scale your build compute to keep build times fast for even the largest TypeScript code bases by increasing the number of remote executors.

The benchmarks in this post used 100 remote executors. Had we increased the number of remote executors by a factor of 2 we would expect to see a similar 2x reduction in build times. How much you gain from increasing remote execution compute depends only on how wide your build graph is and how many actions can be run in parallel.

Fixes coming in Bazel core

Why doesn't this work with Bazel 5?

rules_ts is built on top of rules_js, which uses a symlinked node_modules structure for linking. This means it inherently relies on Node.js tools, such as TypeScript, following symlinks to resolve npm dependencies.

Historically, Bazel turned symlink inputs into actual files on remote executors. This made it incompatible with any actions that depend on symlinks when executing, such as rules_js actions when resolving transitive npm dependencies.

Thanks to recent work by Fabian Meumertzheim, symlinks are now supported with remote execution in Bazel 5.3.0. The last change outstanding for rules_js to work with remote execution is currently in review. This is a fix to keep unresolved symlinks relative in the sandbox & the runfiles trees.

If you want to try remote execution with rules_ts before the last fix lands in Bazel core, follow the instructions here.

Developer Productivity and Build Times

To benchmark remote execution with rules_ts we increased the number of actions used in the original rules_ts benchmarks by 20x so that a full clean build took around 20 minutes locally on a MacBook Pro. This is meant to represent a large-scale enterprise application.

Twenty minutes of waiting on build & test is a representative threshold at which many companies might start to consider bringing remote execution into their Bazel configuration to decrease build & test times and make their developers more productive.

Long waits on build & test can really kill developer productivity at an organization. At twenty minutes, a developer can only iterate three times an hour at best. When working on a difficult problem, quick iterations are crucial to flow and result in faster & better solutions. At twenty minutes, a developer is likely to context switch to other work rather than waiting for the next results. A developer may also settle on a less than ideal solution just so she can move past a problem rather than continuing to iterate slowly.

Benchmarks

The benchmarks used for this post were run against a generated TypeScript code base that mimics a large enterprise scale. It has 100 features, 10 modules per feature, 10 components per module and 1001 lines of code per component. This makes for a total of 11,100 TypeScript files containing over 10 million lines of TypeScript in aggregate. That is a lot of TypeScript code!

For the Bazel build, each module maps to one Bazel target, for a total of 1100 ts_project/ts_library targets.

A timestamp is written to each generated TypeScript source file in this benchmark to intentionally cause cache misses, so that actions are forced to re-run.

Hardware

These benchmarks were run on a MacBook Pro (16-inch 2019), 2.4 GHz 8-Core Intel Core i9, 64 GB 2667 MHz DDR4 running macOS Monterey 12.5.1

The remote execution cluster, provided by our our partners at EngFlow, was made up of 100 executors on AWS c6i.xlarge instances.

Versions of TypeScript and rule sets used were,

GitHub Actions Hosts

We also ran the benchmark on standard GitHub Actions machines with 2 cores and 7 GB ram. These machines were not powerful enough to run local actions in comparable times or without OOM'ing, so only remote execution actions were benchmarked on GitHub Actions hosts.

The ability to run a large Bazel build on relatively small machines is one of the benefits of Bazel remote execution. Instead of allocating CI machines with hundreds of cores, you can instead run your build on very small CI hosts backed by a large auto-scaling remote execution cluster. If tuned well, this configuration can result in significant cost savings on compute.

Javascript Rule Sets

rules_js

rules_js is a high-performance and more compatible spin-off from rules_nodejs and is the result everything the maintainers of rules_nodejs learned over the years.

rules_js, which reach 1.0.0 less than a month ago, is already in use by many companies that we've talked to. We've seen a lot of interest on the #javascript channel on Bazel slack and the rule set already has over 100 stars on GitHub.

Screen Shot 2022-09-05 at 8.45.58 AM.png

Now, rules_ts and rules_js finally make remote execution possible in a high-performance Javascript rule set, while taking a more compatible approach to integrating with Node.js tools.

rules_nodejs

For historical context, while remote execution was possible in some configurations with rules_nodejs, it has never worked well.

Originally, the Node.js toolchain was difficult to use if the host & execution platform did not match. This is the case, for example, if you run Bazel locally on a MacBook but the remote execution cluster uses Linux executors. This issue has now been fixed in the rules_nodejs toolchain layer, which is shared between rules_js and rules_nodejs rules.

Next, rules_nodejs historically enumerated all files in every npm dependency as individual inputs. This resulted in hundreds of thousands of input files to actions in large projects that noticeably slowed down sandbox and runfiles tree creation. When rules_nodejs was updated to use source directories for npm dependency inputs, this resolved the excessive number of inputs, but the optimization was not compatible with remote execution since remote execution does not support source directory inputs.

Finally, rules_nodejs was updated to use declared directories for npm dependencies. This made it compatible with remote execution but the additional overhead of making a directory copy of each npm dependency was a noticeable performance hit to already slow, eager npm dependency fetching & linking and rules_nodejs still suffered from all the other problems inherent with its in-action runtime linker.

Full builds vs. "devserver" builds

In these benchmarks we measure two different scenarios:

  1. A full clean build (bazel build ...) followed by an incremental bazel build ... after making a change to a leaf TypeScript file.

  2. A clean "devserver" build (bazel build :devserver), which emulates a typical developer workflow of building while running a tool such as a devserver, followed by an incremental bazel build :devserver after making a change to a leaf TypeScript file.

The "devserver" scenario is an important measure that emulates the typical local development workflow of coding while running tools such as a devserver or a test runner such as jest. These tools are often run in watch mode while making changes to source code. The faster build times are on changes the shorter the round-trip-time is to get feedback on those changes.

Ideal build times to maximize developer productivity are less than 1 second on changes to leaf nodes and less than 10 seconds on changes that affect large parts of the graph. With a good dependency graph, these ideal times can be preserved even as the project grows.

ts_project vs. ts_library

ts_project was originally developed in rules_nodejs as an alternative to ts_library to provide a cleaner API better suited for the many ways TypeScript is used outside of Google. While the API was better suited for the wild, it could not compete with ts_library, a heavily optimized and deeply integrated wrapper around the TypeScript compiler, on performance.

The new ts_project from rules_ts has significantly reduced the performance gap with ts_library by adding first-class support for Bazel workers and now support for remote execution. We'll refer to ts_project from rules_ts as simply ts_project in this blog post. The original ts_project from rules_nodejs we'll refer to as the @bazel/typescript ts_project.

rules_js, which rules_ts is layered on, made first-class worker support in rules_ts possible by doing away with the dynamic runtime node_modules linking that rules_nodejs uses.

In these benchmarks, we'll measure both ts_project rules configured with swc as the transpiler. swc is an order of magnitude faster that TypeScript for pure transpilation but it does not type-check, so TypeScript is still used for type checking in this split configuration.

The split configuration also removes type checking from the build graph for devserver and test targets, so only transpilation is needed to build them, reducing the round-trip-time on changes when running such targets by an order of magnitude. Type checking is handled in separate targets that can be run explicitly or with the catch-all bazel build ....

Results

Here are the results of the benchmarks.

Full clean build

Screen Shot 2022-09-05 at 12.36.08 PM.png

Fastest: ts_project + swc with remote execution

The fastest full clean build we were observed is on the MacBook Pro host with ts_project + swc and remote execution. This full transpile & type-check of 10M lines of TypeScript code took just 2 minutes and 13 seconds, 8.4x faster times faster than the equivalent build without remote execution running on the MacBook host, which took 18 minutes 35.

In the ts_project + swc configuration, we ran the 11,100 swc transpile actions (one per TypeScript file) locally on the 16 MacBook Pro cores while the remote execution cluster ran the TypeScript type-check and declaration file emit actions. Transpiling with swc is so fast and short-lived that running locally is faster than using remote execution due to network latency and overhead of uploading inputs and downloading outputs.

Runners-up: ts_project & ts_library with remote execution on GitHub actions host

Tied for 2nd fastest builds are ts_project (without swc) with remote execution on a standard GitHub Actions 2 core host and ts_library with remote execution on the same host.

The ts_project build took only 2 minutes and 29 seconds, 7.5x faster than the equivalent build without remote execution running on the MacBook host, which took 18 minutes and 45 seconds. The ts_library build was virtually identical at 2 minutes and 31 seconds.

Running on GitHub Actions, with all actions executing remotely, was faster than running the same configuration on the MacBook host, which took 3 minutes and 2 seconds. The faster build times on GitHub actions were due to the superior network connection on GitHub actions machines compared to the MacBook host running in my home office. TL;DR is that when you're using remote execution, your uplink speed to the remote execution cluster matters.

Incremental full builds

Screen Shot 2022-09-05 at 12.33.07 PM.png

Interestingly, ts_library has the best incremental full build time at 4.1s with its heavily optimized workers. @bazel/typescript ts_project with swc is runner up at 7.5s. ts_project with swc was a close third taking 8.0s. In the smaller original rules_ts benchmark, ts_project with swc was 2nd fastest after ts_library.

Incremental builds with remote execution were slower than local in this scenario as there were not many actions to run and the network overhead of using remote execution made the overall time slower.

tsc is quite a bit slower than every other configuration in this benchmark for incremental builds. It is configured as a single project. In a real world scenario, you would likely split it up to multiple invocations and use TypeScript project references between them which may result in faster build times.

Clean "devserver" builds

Screen Shot 2022-09-05 at 1.01.33 PM.png

As with the original rules_ts benchmark, clean "devserver" transpile-only builds are fastest when the build is configured to use swc as local actions.

ts_project + swc was observed to take just 47 seconds. With @bazel/typescript ts_project, which can also be configured to use swc, the time was measured was comparable at 57s.

Running swc actions on remote executors is a de-optimization since the added time of network latency and upload/download time adds more overhead than the benefit of more executors when on the MacBook Pro host with 16 cores.

Incremental "devserver" builds

Screen Shot 2022-09-05 at 1.04.08 PM.png

Like clean "devserver" builds, the most performant incremental "devserver" builds are the ones that use swc for transpilation with local actions. Both ts_project rules with swc clocked around 2 seconds. ts_library without remote execution was comparably fast at 3.2s.

Remote execution for incremental "devserver" builds was slower than for local builds due to the additional network lately and upload/download time.

The Bottom Line

Remote execution with rules_ts on large TypeScript projects can make an order magnitude impact on large build times. Your developers will be more productive and thank you when their wait time for large builds is reduced by 10x or more. As your project grows, remote execution makes it possible to easily scale your build compute horizontally to keep large builds fast.

For very fast incremental builds with few actions, however, developers may get faster builds times by keeping actions running locally depending on how fast their connection is to the remote execution cluster. Cloud hosted development environments may solve this discrepancy in the future since the developer machines can be located very close to the remote execution cluster to keep network overhead at a minimum.