Configuring Bazel's Downloader

Bazel has a built-in downloader that's used for many things. It has a separate cache from the repository cache, so even if some repository rule re-runs, you won't have to fetch from the internet. It's also configurable, though this is undocumented.

Some repository rules run external programs like yarn install, these aren't aware of Bazel's downloader and do their own network fetches. The downloader is used from WORKSPACE with rules like http_archive or http_file, and can also be used in repository rules with repository_ctx.download[_and_extract].

Repository rules are a source of non-hermeticity in builds. They also present a security vulnerability when fetching untrusted third-party code. For these reasons we recommend using a local read-through cache like Artifactory for better resilience against network outages, with some security scanner to help identify known vulnerabilities in packages stored in the cache.

You can tell Bazel to redirect all your downloads through that read-through cache. We recommend doing this on CI to start with, using --experimental_downloader_config=bazel_downloader.cfg in .bazelrc.

Now you need to create that config file. Though there isn't documentation, the source code for UrlRewriterConfig will get you pretty close.

Here's an example config to get you started.

# This file works by going through each URL and matching it to any rewrite line, and if it matches, adding the second
# parameter as a candidate to the pool. If the candidate set is empty it uses the original. We order this file so that
# the first match should work so that we don't get any `Warnings:` in the logs.

allow s3.amazonaws.com

# For some reason the bazel team decided that mirror.bazel.build should be the source of truth for these 3 files, let
# those through
rewrite (mirror.bazel.build/bazel_coverage_output_generator/.*) artifactory.internal.net/artifactory/$1
rewrite (mirror.bazel.build/bazel_java_tools/.*) artifactory.internal.net/artifactory/$1
rewrite (mirror.bazel.build/openjdk/.*) artifactory.internal.net/artifactory/$1
# For everything else, our urls exactly match what mirror.bazel.build gives so skip the indirection
rewrite mirror.bazel.build/(.*) artifactory.internal.net/artifactory/$1

# Use any of our remote repositories
rewrite (dl.google.com)/(.*) artifactory.internal.net/artifactory/$1/$2
rewrite (files.pythonhosted.org)/(.*) artifactory.internal.net/artifactory/$1/$2
rewrite (github.com)/(.*) artifactory.internal.net/artifactory/$1/$2
rewrite (pypi.python.org)/(.*) artifactory.internal.net/artifactory/$1/$2
rewrite (raw.githubusercontent.com)/(.*) artifactory.internal.net/artifactory/$1/$2
rewrite (releases.llvm.org)/(.*) artifactory.internal.net/artifactory/$1/$2
rewrite (repo.maven.apache.org)/(.*) artifactory.internal.net/artifactory/$1/$2
rewrite (s3.amazonaws.com)/(.*) artifactory.internal.net/artifactory/$1/$2
rewrite (storage.googleapis.com)/(.*) artifactory.internal.net/artifactory/$1/$2
rewrite (www.python.org)/(.*) artifactory.internal.net/artifactory/$1/$2
rewrite (zlib.net)/(.*) artifactory.internal.net/artifactory/$1/$2

# These are identical URLs so instead of making more remote repositories we just alias the others
rewrite pypi.org/(.*) artifactory.internal.net/artifactory/pypi.python.org/$1
rewrite repo1.maven.org/(.*) artifactory.internal.net/artifactory/repo.maven.apache.org/$1

# Improved security: only allow stuff from artifactory
allow artifactory.internal.net
block *