Writing Custom Rules
In the 101 course, we learned how to write .bzl
files containing a simple "pre-processor" construct
called a Macro, which gave us the power to make good BUILD
file authoring experiences.
Most of the time, this is all you need. But "rulesets" must be written for many tools. This section dives into the deeper problem of custom rules.
Introducing custom rules
First, recall that an Action is a transformation from some inputs to some outputs, by spawning a tool.
A "Rule" extends Bazel to understand how to produce an action sub-graph from the user's dependency graph.
A rule is declared with an assignment statement to a public variable, which is evaluated by the Loading phase:
my_rule = rule(
implementation = _my_rule_impl,
attrs = { ... },
...
)
The "implementation" function is called during the Analysis phase to determine which actions are produced.
Features:
- Output Groups: Multiple named sets of outputs
- Can run multiple actions. Which actions run depends on which outputs are requested.
- Inter-operate with other rules: "Providers"
- Walk their dependency graph: "Aspects"
When to write a custom rule
You should always prefer existing rules, and then macros, where possible.
Using ts_project
as an example, this couldn't be a macro for several reasons:
- It creates a tree of actions, which might use one tool to transpile
.js
outputs, and a different tool for producing TypeScript types (.d.ts
files). - It requires that
srcs
have aJsInfo
provider so that it can understand their structure. - It produces a
JsInfo
provider for inter-op with downstream rules that depend on it.
Even when Providers get in your way of "just using a macro", you can often write a tiny adapter rule and then put most of your logic in a more easily understood macro.
For example, this code
adapts a ProtoInfo
on its sources to a DefaultInfo
output.
Principles of successful rule authorship
- Be very sparing in the public API you commit to. Bugs should be in the tool you call, not your rule.
- Don't write a Bazel-specific tool if you can help it. Rules should wrap tools that are already mature and in wide use for the problem being solved.
- Avoid JDK and Node.js runtime. These are meant for long-running processes. Bazel has "Persistent Workers" but these introduce as many problems as they solve.
- Fetch tools as pre-built binaries. It's possible for Bazel to build them from source, but it's slow (how many times have you watched Bazel build
protoc
) and also introduces a failure mode for users whose toolchain doesn't function properly to build them.
Step 1: use the template
We maintain an excellent template for new rulesets. If your rules are going to be public, this is definitely the place to start:
https://github.com/bazel-contrib/rules-template
This takes care of:
- creating needed
bzl_library
targets - automated API documentation generation
- platform-specific toolchain registration for
genrule
and rules - CI testing with GitHub Actions, linting with pre-commit
- WORKSPACE and bzlmod usage
- publish releases just by pushing a tag to the repository
Step 2: fetch toolchain
We need the tool to be on the user's machine. Research how the tool is currently published. We want a reliable way to fetch it from the maintainers. Look for binaries published on a GitHub release, artifacts published to a repository like Maven, PyPI, NPM, etc.
At the end of this step you should be able to bazel run
the tool.
Step 3: Study the CLI
The "man page" for the tool you're running provides the guide for how your rule API should be formed. This allows you to make the thinnest possible layer, and ensures that documentation for the tool outside Bazel is a pretty good start to understanding how to use your rule.
Things to look for in the CLI:
- Are there flags related to hermeticity, like
--dont-download-stuff
? - Can it accept a "flag file" so that very long
argv
doesn't spill the OS limit? - How do you specify the location of output files?
Step 4: Create an example of usage to drive development
Make something like examples/simple/BUILD.bazel
.
First, just make a genrule
. This is going to be the "break glass" for users who just want to call
the tool without using your rule at all.
For example:
genrule(
name = "tar_genrule",
srcs = [
":fixture1",
"src_file",
],
outs = ["1.tar"],
cmd = "$(BSDTAR_BIN) --create --dereference --file $@ -s '#$(BINDIR)##' $(execpath :fixture1) $(execpath src_file)",
toolchains = ["@bsd_tar_toolchains//:resolved_toolchain"],
)
assert_archive_contains(
name = "test_genrule",
archive = "1.tar",
expected = [
"lib/tests/tar/a",
"lib/tests/tar/src_file",
],
)
This lets you get to a first commit with minimal toolchain work so that the genrule runs.
As you add examples, they serve as "executable documentation": they are both your integration test suite, and also with good comments they walk users through all the ways to use your rule.
Step 5: Private API
This should generally be in the following form:
- The implementation function
- (opt) helper functions to form arguments, or declare actions
- A struct for others to build their own rule from your library
In addition, we want to use normal programming practices to break the code into functions, both for readability and re-usability.
The implementation function adapts the attribute API to the CLI of the tool:
# Declare the attributes our rule accepts in the public API
_attrs = {
"args": attr.string_list(
doc = "Additional flags permitted by the tool; see the man page.",
),
"srcs": ...
}
# A factory function to create actions.
# This might be useful for someone who wants to reuse it in some other context.
def _run_action(ctx, executable, inputs, arguments, outputs):
# At some point it calls through to one of these functions
ctx.actions.run()
ctx.actions.run_shell(cmd = "bash -c ...")
# The implementation function takes a rule context argument and adapts the attributes to the
# form needed by the action.
def _my_rule_impl(ctx):
inputs = ctx.attr.srcs[:]
args = ctx.attr.args()
outputs = [ctx.outputs.my_out]
if ctx.attr.interesting:
args.add("--interesting=" ctx.file.interesting.path)
inputs.append(ctx.file.interesting)
_run_action(
ctx,
executable = some_binary,
inputs = inputs,
arguments = args,
outputs = outputs,
)
# Must return a list of Providers.
return [
# This is the common provider that results in outputs printed by the CLI
DefaultInfo(files = depset(outputs), runfiles = ctx.runfiles(outputs))
]
# Expose a Starlark library from this module
my_lib = struct(
attrs = _attrs,
implementation = _my_rule_impl,
)
Step 6: Public API
The public API nearly always wraps the rule in a macro. Since macros and rules are not distinguishable at the use-site, you can always change this later.
We wrap in a macro to provide a few benefits:
- Recommended way to pre-declare outputs, so users can refer to individual outputs with a label
- Lets you yield extra targets like
[name].update
- Can have polymorphic attribute types like "pass a list of strings OR the label of a target"
- Can compose multiple rules - but take care! Macros are a leaky abstraction, and it's difficult to
determine which
**kwargs
need to be forwarded on to each target. Best to do this only with rules you declare in this API.
If users type my_rule()
in their BUILD file, we always want bazel query
to show my_rule
.
This requires a bit of juggling. In your private API, declare the rule, for example:
tar = rule(
doc = "Rule that executes BSD `tar`. Most users should use the [`tar`](#tar) macro, rather than load this directly.",
implementation = tar_lib.implementation,
attrs = tar_lib.attrs,
toolchains = ["@aspect_bazel_lib//lib:tar_toolchain_type"],
)
Then use an alias-load statement to access it in the public API so the macro can be declared with an identical name. You should also expose the underlying rule for users who don't want to use the macro.
load("//lib/private:tar.bzl", _tar = "tar")
tar_rule = _tar
def tar(name, mtree = "auto", **kwargs):
# The leading underscore implies this is a "private" target
mtree_target = "_{}.mtree".format(name)
# Polymorphic attribute type
if mtree == "auto":
mtree_spec(
name = mtree_target,
srcs = kwargs["srcs"],
out = "{}.txt".format(mtree_target),
)
elif types.is_list(mtree):
write_file(
name = mtree_target,
out = "{}.txt".format(mtree_target),
# Ensure there's a trailing newline, as bsdtar will ignore a last line without one
content = mtree + [""],
)
else:
mtree_target = mtree
tar_rule(
name = name,
mtree = mtree_target,
**kwargs
)
Exercise
Let's pair-program! We will add a custom rule to the bazel-examples repository.
- Someone in the class, propose a rule that might be interesting for your org.
- We will spend 10-15 minutes trying to write one. In the interest of time, the instructor will present.