Merge pull request #11944 from protocolbuffers/gha-port-22.x
Backport GHA fixes and optimizations to 22.xpull/11946/head
commit
b7f7171c31
15 changed files with 397 additions and 196 deletions
@ -0,0 +1,204 @@ |
|||||||
|
This directory contains all of our automatically triggered workflows. |
||||||
|
|
||||||
|
# Test runner |
||||||
|
|
||||||
|
Our top level `test_runner.yml` is responsible for kicking off all tests, which |
||||||
|
are represented as reusable workflows. This is carefully constructed to satisfy |
||||||
|
the design laid out in go/protobuf-gha-protected-resources (see below), and |
||||||
|
duplicating it across every workflow file would be difficult to maintain. As an |
||||||
|
added bonus, we can manually dispatch our full test suite with a single button |
||||||
|
and monitor the progress of all of them simultaneously in GitHub's actions UI. |
||||||
|
|
||||||
|
There are five ways our test suite can be triggered: |
||||||
|
|
||||||
|
- **Post-submit tests** (`push`): These are run over newly submitted code |
||||||
|
that we can assume has been thoroughly reviewed. There are no additional |
||||||
|
security concerns here and these jobs can be given highly privileged access to |
||||||
|
our internal resources and caches. |
||||||
|
|
||||||
|
- **Pre-submit tests from a branch** (`push_request`): These are run over |
||||||
|
every PR as changes are made. Since they are coming from branches in our |
||||||
|
repository, they have secret access by default and can also be given highly |
||||||
|
privileged access. However, we expect *many* of these events per change, |
||||||
|
and likely many from abandoned/exploratory changes. Given the much higher |
||||||
|
frequency, we restrict the ability to *write* to our more expensive caches. |
||||||
|
|
||||||
|
- **Pre-submit tests from a fork** (`push_request_target`): These are run |
||||||
|
over every PR from a forked repository as changes are made. These have much |
||||||
|
more restricted access, since they could be coming from anywhere. To protect |
||||||
|
our secret keys and our resources, tests will not run until a commit has been |
||||||
|
labeled `safe to submit`. Further commits will require further approvals to |
||||||
|
run our test suite. Once marked as safe, we will provide read-only access to |
||||||
|
our caches and Docker images, but will generally disallow any writes to shared |
||||||
|
resources. |
||||||
|
|
||||||
|
- **Continuous tests** (`schedule`): These are run on a fixed schedule. We |
||||||
|
currently have them set up to run daily, and can help identify non-hermetic |
||||||
|
issues in tests that don't get run often (such as due to test caching) or during |
||||||
|
slow periods like weekends and holidays. Similar to post-submit tests, these |
||||||
|
are run over submitted code and are highly privileged in the resources they |
||||||
|
can use. |
||||||
|
|
||||||
|
- **Manual testing** (`workflow_dispatch`): Our test runner can be triggered |
||||||
|
manually over any branch. This is treated similarly to pre-submit tests, |
||||||
|
which should be highly privileged because they can only be triggered by the |
||||||
|
protobuf team. |
||||||
|
|
||||||
|
# Staleness handling |
||||||
|
|
||||||
|
While Bazel handles code generation seamlessly, we do support build systems that |
||||||
|
don't. There are a handful of cases where we need to check in generated files |
||||||
|
that can become stale over time. In order to provide a good developer |
||||||
|
experience, we've implemented a system to make this more manageable. |
||||||
|
|
||||||
|
- Stale files should have a corresponding `staleness_test` Bazel target. This |
||||||
|
should be marked `manual` to avoid getting picked up in CI, but will fail if |
||||||
|
files become stale. It also provides a `--fix` flag to update the stale files. |
||||||
|
|
||||||
|
- Bazel tests will never depend on the checked-in versions, and will generate |
||||||
|
new ones on-the-fly during build. |
||||||
|
|
||||||
|
- Non-Bazel tests will always regenerate necessary files before starting. This |
||||||
|
is done using our `bash` and `docker` actions, which should be used for any |
||||||
|
non-Bazel tests. This way, no tests will fail due to stale files. |
||||||
|
|
||||||
|
- A post-submit job will immediately regenerate any stale files and commit them |
||||||
|
if they've changed. |
||||||
|
|
||||||
|
- A scheduled job will run late at night every day to make sure the post-submit |
||||||
|
is working as expected (that is, it will run all the staleness tests). |
||||||
|
|
||||||
|
The `regenerate_stale_files.sh` script is the central script responsible for all |
||||||
|
the re-generation of stale files. |
||||||
|
|
||||||
|
# Forked PRs |
||||||
|
|
||||||
|
Because we need secret access to run our tests, we use the `pull_request_target` |
||||||
|
event for PRs coming from forked repositories. We do checkout the code from the |
||||||
|
PR's head, but the workflow files themselves are always fetched from the *base* |
||||||
|
branch (that is, the branch we're merging to). Therefore, any changes to these |
||||||
|
files won't be tested, so we explicitly ban PRs that touch these files. |
||||||
|
|
||||||
|
# Caches |
||||||
|
|
||||||
|
We have a number of different caching strategies to help speed up tests. These |
||||||
|
live either in GCP buckets or in our GitHub repository cache. The former has |
||||||
|
a lot of resources available and we don't have to worry as much about bloat. |
||||||
|
On the other hand, the GitHub repository cache is limited to 10GB, and will |
||||||
|
start pruning old caches when it exceeds that threshold. Therefore, we need |
||||||
|
to be very careful about the size and quantity of our caches in order to |
||||||
|
maximize the gains. |
||||||
|
|
||||||
|
## Bazel remote cache |
||||||
|
|
||||||
|
As described in https://bazel.build/remote/caching, remote caching allows us to |
||||||
|
offload a lot of our build steps to a remote server that holds a cache of |
||||||
|
previous builds. We use our GCP project for this storage, and configure |
||||||
|
*every* Bazel call to use it. This provides substantial performance |
||||||
|
improvements at minimal cost. |
||||||
|
|
||||||
|
We do not allow forked PRs to upload updates to our Bazel caches, but they |
||||||
|
do use them. Every other event is given read/write access to the caches. |
||||||
|
Because Bazel behaves poorly under certain environment changes (such as |
||||||
|
toolchain, operating system), we try to use finely-grained caches. Each job |
||||||
|
should typically have its own cache to avoid cross-pollution. |
||||||
|
|
||||||
|
## Bazel repository cache |
||||||
|
|
||||||
|
When Bazel starts up, it downloads all the external dependencies for a given |
||||||
|
build and stores them in the repository cache. This cache is *separate* from |
||||||
|
the remote cache, and only exists locally. Because we have so many Bazel |
||||||
|
dependencies, this can be a source of frequent flakes due to network issues. |
||||||
|
|
||||||
|
To avoid this, we keep a cached version of the repository cache in GitHub's |
||||||
|
action cache. Our full set of repository dependencies ends up being ~300MB, |
||||||
|
which is fairly expensive given our 10GB maximum. The most expensive ones seem |
||||||
|
to come from Java, which has some very large downstream dependencies. |
||||||
|
|
||||||
|
Given the cost, we take a more conservative approach for this cache. Only push |
||||||
|
events will ever write to this cache, but all events can read from them. |
||||||
|
Additionally, we only store three caches for any given commit, one per platform. |
||||||
|
This means that multiple jobs are trying to update the same cache, leading to a |
||||||
|
race. GitHub rejects all but one of these updates, so we designed the system so |
||||||
|
that caches are only updated if they've actually changed. That way, over time |
||||||
|
(and multiple pushes) the repository caches will incrementally grow to encompass |
||||||
|
all of our dependencies. A scheduled job will run monthly to clear these caches |
||||||
|
to prevent unbounded growth as our dependencies evolve. |
||||||
|
|
||||||
|
## ccache |
||||||
|
|
||||||
|
In order to speed up non-Bazel builds to be on par with Bazel, we make use of |
||||||
|
[ccache](https://ccache.dev/). This intercepts all calls to the compiler, and |
||||||
|
caches the result. Subsequent calls with a cache-hit will very quickly |
||||||
|
short-circuit and return the already computed result. This has minimal affect |
||||||
|
on any *single* job, since we typically only run a single build. However, by |
||||||
|
caching the ccache results in GitHub's action cache we can substantially |
||||||
|
decrease the build time of subsequent runs. |
||||||
|
|
||||||
|
One useful feature of ccache is that you can set a maximum cache size, and it |
||||||
|
will automatically prune older results to keep below that limit. On Linux and |
||||||
|
Mac cmake builds, we generally get 30MB caches and set a 100MB cache limit. On |
||||||
|
Windows, with debug symbol stripping we get ~70MB and set a 200MB cache limit. |
||||||
|
|
||||||
|
Because CMake build tend to be our slowest, bottlenecking the entire CI process, |
||||||
|
we use a fairly expensive strategy with ccache. All events will cache their |
||||||
|
ccache directory, keyed by the commit and the branch. This means that each |
||||||
|
PR and each branch will write its own set of caches. When looking up which |
||||||
|
cache to use initially, each job will first look for a recent cache in its |
||||||
|
current branch. If it can't find one, it will accept a cache from the base |
||||||
|
branch (for example, PRs will initially use the latest cache from their target |
||||||
|
branch). |
||||||
|
|
||||||
|
While the ccache caches quickly over-run our GitHub action cache, they also |
||||||
|
quickly become useless. Since GitHub prunes caches based on the time they were |
||||||
|
last used, this just means that we'll see quicker turnover. |
||||||
|
|
||||||
|
## Bazelisk |
||||||
|
|
||||||
|
Bazelisk will automatically download a pinned version of Bazel on first use. |
||||||
|
This can lead to flakes, and to avoid that we cache the result keyed on the |
||||||
|
Bazel version. Only push events will write to this cache, but it's unlikely |
||||||
|
to change very often. |
||||||
|
|
||||||
|
## Docker images |
||||||
|
|
||||||
|
Instead of downloading a fresh Docker image for every test run, we can save it |
||||||
|
as a tar and cache it using `docker image save` and later restore using |
||||||
|
`docker image load`. This can decrease download times and also reduce flakes. |
||||||
|
Note, Docker's load can actually be significantly slower than a pull in certain |
||||||
|
situations. Therefore, we should reserve this strategy for only Docker images |
||||||
|
that are causing noticeable flakes. |
||||||
|
|
||||||
|
## Pip dependencies |
||||||
|
|
||||||
|
The actions/setup-python action we use for Python supports automated caching |
||||||
|
of pip dependencies. We enable this to avoid having to download these |
||||||
|
dependencies on every run, which can lead to flakes. |
||||||
|
|
||||||
|
# Custom actions |
||||||
|
|
||||||
|
We've defined a number of custom actions to abstract out shared pieces of our |
||||||
|
workflows. |
||||||
|
|
||||||
|
- **Bazel** use this for running all Bazel tests. It can take either a single |
||||||
|
Bazel command or a more general bash command. In the latter case, it provides |
||||||
|
environment variables for running Bazel with all our standardized settings. |
||||||
|
|
||||||
|
- **Bazel-Docker** nearly identical to the **Bazel** action, this additionally |
||||||
|
runs everything in a specified Docker image. |
||||||
|
|
||||||
|
- **Bash** use this for running non-Bazel tests. It takes a bash command and |
||||||
|
runs it verbatim. It also handles the regeneration of stale files (which does |
||||||
|
use Bazel), which non-Bazel tests might depend on. |
||||||
|
|
||||||
|
- **Docker** nearly identical to the **Bash** action, this additionally runs |
||||||
|
everything in a specified Docker image. |
||||||
|
|
||||||
|
- **ccache** this sets up a ccache environment, and initializes some |
||||||
|
environment variables for standardized usage of ccache. |
||||||
|
|
||||||
|
- **Cross-compile protoc** this abstracts out the compilation of protoc using |
||||||
|
our cross-compilation infrastructure. It will set a `PROTOC` environment |
||||||
|
variable that gets automatically picked up by a lot of our infrastructure. |
||||||
|
This is most useful in conjunction with the **Bash** action with non-Bazel |
||||||
|
tests. |
@ -0,0 +1,27 @@ |
|||||||
|
name: Forked PR workflow check |
||||||
|
|
||||||
|
# This workflow prevents modifications to our workflow files in PRs from forked |
||||||
|
# repositories. Since tests in these PRs always use the workflows in the |
||||||
|
# *target* branch, modifications to these files can't be properly tested. |
||||||
|
|
||||||
|
on: |
||||||
|
# safe presubmit |
||||||
|
pull_request: |
||||||
|
branches: |
||||||
|
- main |
||||||
|
- '[0-9]+.x' |
||||||
|
# The 21.x branch still uses Kokoro |
||||||
|
- '!21.x' |
||||||
|
# For testing purposes so we can stage this on the `gha` branch. |
||||||
|
- gha |
||||||
|
paths: |
||||||
|
- '.github/workflows/**' |
||||||
|
|
||||||
|
jobs: |
||||||
|
check: |
||||||
|
name: Check PR source |
||||||
|
runs-on: ubuntu-latest |
||||||
|
steps: |
||||||
|
- run: > |
||||||
|
${{ github.event.pull_request.head.repo.full_name == 'protocolbuffers/protobuf' }} || |
||||||
|
(echo "This pull request is from an unsafe fork (${{ github.event.pull_request.head.repo.full_name }}) and isn't allowed to modify workflow files!" && exit 1) |
@ -0,0 +1,17 @@ |
|||||||
|
This directory contains CI-specific tooling. |
||||||
|
|
||||||
|
# Clang wrappers |
||||||
|
|
||||||
|
CMake allows for compiler wrappers to be injected such as ccache, which |
||||||
|
intercepts compiler calls and short-circuits on cache-hits. This can be done |
||||||
|
by specifying `CMAKE_C_COMPILER_LAUNCHER` and `CMAKE_CXX_COMPILER_LAUNCHER` |
||||||
|
during CMake's configure step. Unfortunately, X-Code doesn't provide anything |
||||||
|
like this, so we use basic wrapper scripts to invoke ccache + clang. |
||||||
|
|
||||||
|
# Bazelrc files |
||||||
|
|
||||||
|
In order to allow platform-specific `.bazelrc` flags during testing, we keep |
||||||
|
3 different versions here along with a shared `common.bazelrc` that they all |
||||||
|
include. Our GHA infrastructure will select the appropriate file for any test |
||||||
|
and overwrite the default `.bazelrc` in our workspace, which is intended for |
||||||
|
development only. |
Loading…
Reference in new issue