Writing Custom Rules

In the 101 course, we learned how to write .bzl files containing a simple "pre-processor" construct called a Macro, which gave us the power to make good BUILD file authoring experiences.

Most of the time, this is all you need. But "rulesets" must be written for many tools. This section dives into the deeper problem of custom rules.

Introducing custom rules

First, recall that an Action is a transformation from some inputs to some outputs, by spawning a tool.

Definition

A "Rule" extends Bazel to understand how to produce an action sub-graph from the user's dependency graph.

A rule is declared with an assignment statement to a public variable, which is evaluated by the Loading phase:

my_rule = rule(
    implementation = _my_rule_impl,
    attrs = { ... },
    ...
)

The "implementation" function is called during the Analysis phase to determine which actions are produced.

Features:

Output Groups: Multiple named sets of outputs
Can run multiple actions. Which actions run depends on which outputs are requested.
Inter-operate with other rules: "Providers"
Walk their dependency graph: "Aspects"

When to write a custom rule

You should always prefer existing rules, and then macros, where possible.

Using ts_project as an example, this couldn't be a macro for several reasons:

It creates a tree of actions, which might use one tool to transpile .js outputs, and a different tool for producing TypeScript types (.d.ts files).
It requires that srcs have a JsInfo provider so that it can understand their structure.
It produces a JsInfo provider for inter-op with downstream rules that depend on it.

note

Even when Providers get in your way of "just using a macro", you can often write a tiny adapter rule and then put most of your logic in a more easily understood macro.

For example, this code adapts a ProtoInfo on its sources to a DefaultInfo output.

Principles of successful rule authorship

Be very sparing in the public API you commit to. Bugs should be in the tool you call, not your rule.
Don't write a Bazel-specific tool if you can help it. Rules should wrap tools that are already mature and in wide use for the problem being solved.
Avoid JDK and Node.js runtime. These are meant for long-running processes. Bazel has "Persistent Workers" but these introduce as many problems as they solve.
Fetch tools as pre-built binaries. It's possible for Bazel to build them from source, but it's slow (how many times have you watched Bazel build protoc) and also introduces a failure mode for users whose toolchain doesn't function properly to build them.

Step 1: use the template

We maintain an excellent template for new rulesets. If your rules are going to be public, this is definitely the place to start:

https://github.com/bazel-contrib/rules-template

This takes care of:

creating needed bzl_library targets
automated API documentation generation
platform-specific toolchain registration for genrule and rules
CI testing with GitHub Actions, linting with pre-commit
WORKSPACE and bzlmod usage
publish releases just by pushing a tag to the repository

Internal-only rulesets often skip the documentation, platform support beyond the bare necessities, etc. for better or worse.

Step 2: Fetch toolchain

We need the tool to be on the user's machine. Research how the tool is currently published. We want a reliable way to fetch it from the maintainers. Look for binaries published on a GitHub release, artifacts published to a repository like Maven, PyPI, NPM, etc.

At the end of this step you should be able to bazel run the tool.

Step 3: Study the CLI

The "man page" for the tool you're running provides the guide for how your rule API should be formed. This allows you to make the thinnest possible layer, and ensures that documentation for the tool outside Bazel is a pretty good start to understanding how to use your rule.

Things to look for in the CLI:

Are there flags related to hermeticity, like --dont-download-stuff?
Can it accept a "flag file" so that very long argv doesn't spill the OS limit?
How do you specify the location of output files?

Some tools are better accessed as a library rather than a CLI, or with an adapter/wrapper CLI around the upstream CLI.

Step 4: Create an example of usage to drive development

Make something like examples/simple/BUILD.bazel.

First, just make a genrule. This is going to be the "break glass" for users who just want to call the tool without using your rule at all.

For example:

genrule(
    name = "tar_genrule",
    srcs = [
        ":fixture1",
        "src_file",
    ],
    outs = ["1.tar"],
    cmd = "$(BSDTAR_BIN) --create --dereference --file $@ -s '#$(BINDIR)##' $(execpath :fixture1) $(execpath src_file)",
    toolchains = ["@bsd_tar_toolchains//:resolved_toolchain"],
)

assert_archive_contains(
    name = "test_genrule",
    archive = "1.tar",
    expected = [
        "lib/tests/tar/a",
        "lib/tests/tar/src_file",
    ],
)

This lets you get to a first commit with minimal toolchain work so that the genrule runs.

As you add examples, they serve as "executable documentation": they are both your integration test suite, and also with good comments they walk users through all the ways to use your rule.

Step 5: Private API

This should generally be in the following form:

The implementation function
(opt) helper functions to form arguments, or declare actions
A struct for others to build their own rule from your library

In addition, we want to use normal programming practices to break the code into functions, both for readability and re-usability.

The implementation function adapts the attribute API to the CLI of the tool:

rule.bzl
# Declare the attributes our rule accepts in the public API
_attrs = {
    "args": attr.string_list(
        doc = "Additional flags permitted by the tool; see the man page.",
    ),
    "srcs": ...
}

# A factory function to create actions.
# This might be useful for someone who wants to reuse it in some other context.
def _run_action(ctx, executable, inputs, arguments, outputs):
    # At some point it calls through to one of these functions
    ctx.actions.run()
    ctx.actions.run_shell(cmd = "bash -c ...")

# The implementation function takes a rule context argument and adapts the attributes to the
# form needed by the action.
def _my_rule_impl(ctx):
    inputs = ctx.attr.srcs[:]
    args = ctx.attr.args()
    outputs = [ctx.outputs.my_out]

    if ctx.attr.interesting:
        args.add("--interesting=" ctx.file.interesting.path)
        inputs.append(ctx.file.interesting)

    _run_action(
        ctx,
        executable = some_binary,
        inputs = inputs,
        arguments = args,
        outputs = outputs,
    )

    # Must return a list of Providers.
    return [
        # This is the common provider that results in outputs printed by the CLI
        DefaultInfo(files = depset(outputs), runfiles = ctx.runfiles(outputs))
    ]

# Expose a Starlark library from this module
my_lib = struct(
    attrs = _attrs,
    implementation = _my_rule_impl,
)

Step 6: Public API

The public API nearly always wraps the rule in a macro. Since macros and rules are not distinguishable at the use-site, you can always change this later.

We wrap in a macro to provide a few benefits:

Recommended way to pre-declare outputs, so users can refer to individual outputs with a label
Lets you yield extra targets like [name].update
Can have polymorphic attribute types like "pass a list of strings OR the label of a target"
Can compose multiple rules - but take care! Macros are a leaky abstraction, and it's difficult to determine which **kwargs need to be forwarded on to each target. Best to do this only with rules you declare in this API.

If users type my_rule() in their BUILD file, we always want bazel query to show my_rule. This requires a bit of juggling. In your private API, declare the rule, for example:

private/tar.bzl
tar = rule(
    doc = "Rule that executes BSD `tar`. Most users should use the [`tar`](#tar) macro, rather than load this directly.",
    implementation = tar_lib.implementation,
    attrs = tar_lib.attrs,
    toolchains = ["@aspect_bazel_lib//lib:tar_toolchain_type"],
)

Then use an alias-load statement to access it in the public API so the macro can be declared with an identical name. You should also expose the underlying rule for users who don't want to use the macro.

defs.bzl
load("//lib/private:tar.bzl", _tar = "tar")

tar_rule = _tar

def tar(name, mtree = "auto", **kwargs):
    # The leading underscore implies this is a "private" target
    mtree_target = "_{}.mtree".format(name)

    # Polymorphic attribute type
    if mtree == "auto":
        mtree_spec(
            name = mtree_target,
            srcs = kwargs["srcs"],
            out = "{}.txt".format(mtree_target),
        )
    elif types.is_list(mtree):
        write_file(
            name = mtree_target,
            out = "{}.txt".format(mtree_target),
            # Ensure there's a trailing newline, as bsdtar will ignore a last line without one
            content = mtree + [""],
        )
    else:
        mtree_target = mtree

    tar_rule(
        name = name,
        mtree = mtree_target,
        **kwargs
    )

Exercise

Let's study a rule relevant to your Bazel use and see these patterns in context there.

More ambitiously, let's pair-program, adding a (simple) custom rule to the bazel-examples repository.

Someone in the class, propose a rule that might be interesting for your org.
We will spend 10-15 minutes trying to write one. In the interest of time, the instructor will present.

Introducing custom rules​

When to write a custom rule​

Principles of successful rule authorship​

Step 1: use the template​

Step 2: Fetch toolchain​

Step 3: Study the CLI​

Step 4: Create an example of usage to drive development​

Step 5: Private API​

Step 6: Public API​

Exercise​