Running Tests with Bazel

Now that we were able to build the code, we should figure out if it's working.

By the end of this section, you'll have used bazel test to run some automated tests for the language you picked.

Concepts

Bazel models tests as just a special case of running programs, where the exit code matters. So you can think of bazel test my_test as syntax sugar for bazel build my_test && ./bazel-bin/my_test

The "program" being run is usually a "Test Runner" from your language ecosystem such as JUnit, pytest, mocha, etc. However it can also be a simple shell script, or any other program you write. Under Bazel, it's often useful to write your own Test Runner rather than build your test in some existing test/assertion framework, since Bazel handles all the mechanics of including your test in the build process.

Control over the test process is largely left to the user's Bazel command-line flags. This includes things like:

--test_args - arguments to forward to the test runner's CLI
--test_env - which environment variables should be included in the test's inputs (and cache key)
--test_output
- --test_output=streamed - very useful to say "stream the log" so it's similar to watching the test runner CLI run
- --test_output=errors - typical to get Bazel to print the test failures to the stdout, otherwise you can only get them from the log in bazel-testlogs

The Encyclopedia

The Test Encyclopedia is described as "an exhaustive specification of the test execution environment" and they really mean that.

This contract between Bazel and your test runner process will often resolve a dispute over why someone's test isn't working the way they expect under Bazel. Please take a minute to scan that document right now. Really, just knowing what questions it answers can save you a bunch of time later. We'll wait.

Aspect CLI only

$ bazel docs test-encyclopedia

Common tags

Tests have a size and timeout attribute.

Size is an architectural hint. It implies a timeout, and the scheduler also reserves more RAM.

small = unit test
medium = functional test
large = integration
enormous = e2e The timeout is set based on the size. Size is useful for filtering, e.g. you can use --test_size_filters=small to ask bazel to "just run the unit tests".

note

You can also filter with these flags:

--test_tag_filters, e.g. with =smoke to run tests with a custom tag "smoke"
--test_timeout_filters, e.g. with =-eternal to skip running tests that take over 15min
--test_lang_filters, e.g. with =js,go to run just the js_test and go_test targets

Unfortunately timeout has an undesirable default value of moderate. It should have been the shortest one, so that developers are reminded to increase it when a test times out. To help get "correct" timeout values, we recommend always setting --test_verbose_timeout_warnings in .bazelrc so that "timeout too long" messages are provided to developers.

You can add these to the tags attribute of a test to change the way Bazel runs it.

external - the test is intentionally non-hermetic, as it tests something aside from its declared inputs. Forces the test to be unconditionally executed, regardless of the value of --cache_test_results. so the result is not cached.
exclusive - the test is not isolated, and can interact with other tests running at the same time. Exclusive tests are executed in serial fashion after all build activity and non-exclusive tests have been completed. They can't be run remotely either.
manual - essentially "skip" or "disabled" - it means a target wildcard pattern like :all or /... won't include this target. You can still run it by listing the target explicitly.
requires-network - declare that the test should run in a sandbox that allows network access.
flaky - run it up to three times when it fails. See section below.

Aspect CLI only

bazel help tags

info

Run bazel help test

Coverage

You can run bazel coverage to collect coverage data for supported languages/test runners. Bazel will combine coverage reports from multiple test runners.

This area of Bazel doesn't work very well for many languages.

danger

Bazel's coverage verb sets flags which cause the analysis cache to be discarded, which can cost minutes in CI. Consider using the equivalent bazel test --collect_coverage --instrumentation_filter=^//

Test XML

Bazel assumes that test runners can produce a test-case level reporting output in the "JUnit XML" format. These are collected in the bazel-testlogs folder.

Other test outputs

You might want to grab screenshots or other output files that a test generates. Bazel doesn't allow tests to produce outputs the way build steps can, since tests are not build "Actions" but rather just some program being run.

You can read the environment variable TEST_UNDECLARED_OUTPUTS_DIR from your test, and write files into that folder. After Bazel finishes, you can collect the results as zip files from the bazel-testlogs folder.

External services

Tests often want to connect to services/datasources as part of the "system under test". Many builds are setup to do this outside the build tool, like so:

Start up some services, or populate a database
Run the entry point for the testing tool
Clean up

Bazel does not support this model. Bazel tests are just programs that exit 0 or not, and Bazel has no "lifecycle" hooks to run some setup or teardown for specific test targets.

Of course, you could just script around Bazel, the same as in the scenario above, by starting some services before running bazel test and then shutting them down at the end. However this doesn't work well with remote execution. It also assumes that concurrent tests will be isolated from each other when accessing the shared resource, and means you'll startup the services even if Bazel doesn't execute any tests because they are cache hits.

Ideally, tests are hermetic. That means they depend only on declared inputs, which are files. If a test needs to connect to a service, you could invert the above model - the testing tool runs the test, which sets up the environment and tears it down. Testcontainers is a great library for using Docker containers as a part of the system under test.

Managing flakiness

Ideally, engineers would write deterministic tests. Not only is that unlikely to happen, it's sometimes not the best use of their time. What we all really want is for a passing test to mean everything is good, and for a failing test to not waste our time, assuming the infrastructure can get it to pass with some retries.

note

Bazel returns a special FLAKY status when a test has a mix of fail and pass.

There are two reasonable approaches for CI:

Use --flaky_test_attempts=[number] commonly with a value like 2 or 3. This will run any test 1-2 additional times if it fails. This is nice since - you don't have to tell Bazel which tests are flaky ahead of time - only CI will do the retries, while developers will locally see a failure which might motivate them to fix the problem However the downside: it increases the time to report an actual failing test to 2-3x the test's runtime.
Allow a single failure of a test to fail the build, then tag the test target with flaky = True. Bazel will run a flaky test up to two additional times after the first failure. The downside is that the version control system becomes the database of which tests are flaky, and the database needs to be maintained manually. We recommend giving the BuildCop a one-click way to mark a test as flaky (or remove it) by making a bot commit to the repository which uses Buildozer to make the BUILD file edit. We are building a GitHub bot which does exactly this.

Determining if flakiness is fixed

When fixing a flaky test, it can be hard to know that the fix is right, since it passes sometimes with the bug. If the test's non-determinism can be reproduced locally by running it a few times, then use the flag --runs_per_test=[number] to "roll the dice" this many times.

Try it: `bazel test`

Try adding targets to the BUILD.bazel files to run some unit test for the language you're working in. If there isn't already a test, try adding one, using a test runner you're familiar with.

Concepts​

The Encyclopedia​

Common tags​

Coverage​

Test XML​

Other test outputs​

External services​

Managing flakiness​

Determining if flakiness is fixed​

Try it: bazel test​