Version: 5.11.x

Optimize for Speed and Cost

The goal in this stage is to tune the code in the repository as well as the Workflows configuration to find a best case that provides very fast build & test results, while reducing the cloud cost required to operate CI.

Turn on "Build without the bytes"

This is the term of art for a Bazel setup in which outputs are never downloaded back to the computer where bazel is running. Instead they are only stored in the remote cache.

This feature is enabled by default in Bazel 7: blog post

Aspect recommends upgrading to Bazel 7 if possible, as this feature became more stable.

To enable it on Bazel 5 or Bazel 6, add build --remote_download_minimal in your .aspect/workflows/bazelrc file, or use the equivalent --remote_download_outputs=minimal flag.

High cache-hit rate

Non-determinism is the property of a build step where the output varies even when inputs are the same.

Under Bazel, this causes downstream cache misses, as those changed outputs bust the cache key.

There are several ways to check the cache hit rate:

If the grafana dashboard is setup, check the cache hit rate meter. We expect over 95%.
Look for a line at the end of Bazel output like Executed 1 out of 347 tests: 345 tests pass and 2 were skipped.. The "executed" number should be low on most runs.

To identify causes of non-determinism, collect Bazel's execution logs for two similar builds and compare them.

The bazel_debug_assistance option will gather execution logs. Only enable this while debugging, as it causes builds to be slower and use a lot of disk.

   - build:
       bazel:
         bazel_debug_assistance: true

Once this option is enabled, execution logs from two separate Bazel runs must be downloaded. One way to accomplish this is to merge a commit with this option, letting a main branch run produce the output. It will appear among the artifacts produced by the Workflows run, with the prefix exec. and is usually a large file, like the following sample on Buildkite:

Artifacts tab with an execution log

The next step is to gather a second log. Ideally it should not include any source code changes that caus legitimate cache misses, so the CI system can just be triggered as a "retry" at the same commit as the first log. Also, it's best to ensure a different runner is used. Terminate all runners, or wait for the pool to scale in to zero, or even retry a build from the previous day.

After downloading two execution logs, the common instructions for comparing them is provided in the Bazel documentation

coming soon

Aspect plans to add first-class determinism checking support including a built-in execution log comparison.

Right-size instance types

Look through the instance types available through your cloud provider, taking into account the availability in your region or partition.

Cheaper instances which are too low on resources cause slow builds, or may run out of memory. However, instances with generous resources are usually expensive.

Aspect recommends performing some experiments by doubling or halving instance sizes to "binary search" for an ideal trade-off between speed and cost.

Consider different CPU architectures as well - ARM machines are generally less expensive.

Resizing instances

Changing the size of any already running instances destroys any cache present, meaning the next build is uncached and result in a slower first run.

If the old instances have any form of termination protection enabled, then you also need to manually terminate them.

If using a container orchestration system to manage storage services, then wait for it to show all tasks up and running before terminating the old instances.

Enable "rebase" branch freshness

Bazel's performance is severely degraded when a warm machine must sync to a version control state that invalidates expensive cache entries. To avoid this, the update_strategy attribute can rebase Pull Requests onto the most recent commit of the target branch. See Configuration.

Note that a secret may be required for interactions with Version Control.

Warming

When the runner pool scales out, new machines are booted up to run bazel workloads. If these machines are cold, the build and test will be much slower.

coming soon

A dedicated page for warming setup is on the way.

Turn on "Build without the bytes"​

High cache-hit rate​

Right-size instance types​

Enable "rebase" branch freshness​

Warming​

Turn on "Build without the bytes"

High cache-hit rate

Right-size instance types

Enable "rebase" branch freshness

Warming