The `work_stealing` experiment on its own is not very valuable, so let's
delete it and save CI resources. We have a benchmark for
`GRPC_EXPERIMENTS=event_engine_listener,work_stealing`, which is really
what we care about right now.
This should get the benchmarks running again. The dotnet benchmark is
broken (unclear if it's still necessary), and the grpc-go benchmark
build currently fails. The go benchmark should be re-enabled when the
dockerfiles are fixed. The rest of the dotnet benchmark configuration /
artifacts should be deleted or fixed as well. @jtattermusch
Based on https://github.com/grpc/grpc-go/pull/6463
<!--
If you know who should review your pull request, please assign it to
that
person, otherwise the pull request would get assigned randomly.
If your pull request is for a specific language, please add the
appropriate
lang label.
-->
Add "bazelified" non-bazel tests. See tools/bazelify_tests/README.md for
the core idea.
- add a bunch of test targets that run under docker and execute tests
that correspond to `run_tests.py -l LANG ...`
- many more tests can be added in the future
- to enable running some of the C/C++ portability tests easily, added
support for `--cmake_extra_configure_args` in run_tests.py (the change
is fairly small).
Example passing build that shows how test results are structured:
https://source.cloud.google.com/results/invocations/21295351-a3e3-4be1-b6e9-aaf52195a044/targets
This enables both of the `event_engine_listener` and `work_stealing`
experiments together, which we expect will have better performance. The
benchmark-config-generation script required some light modification to
support running multiple experiments at the same time.
This adds a new GKE benchmark job, which runs the set of "dashboard"
scenarios for every gRPC experiment configured in the script. Results
are published to BigQuery at
`e2e_benchmarks.ci_cxx_experiment_results_${N}core.${experiment}`
See https://github.com/grpc/grpc/pull/33907 for the scenario config.
Scenarios for language `node` specify the server language as `node`
(instead of leaving it blank), so a flag must be added to
`--allow_server_language=node`.
Scenarios for language `node_purejs` differ in name and in scenario
settings, but otherwise run on identical clients and servers. This
change treats `node_purejs` as `node` for the purpose of generating load
test configurations.
https://github.com/grpc/grpc/pull/33699 incorrectly changed the legacy
builds to not just use the test driver from the master, but also to
build from it. This PR fixes the issue, and also updates the python job
to work use the driver from master.
follow up from #33590
<!--
If you know who should review your pull request, please assign it to
that
person, otherwise the pull request would get assigned randomly.
If your pull request is for a specific language, please add the
appropriate
lang label.
-->
We are seeing `g++: fatal error: Killed signal terminated program
cc1plus` on PHP distribtest builds. In case it's an OOM, let's try
reducing the build parallelism to see if it helps.
<!--
If you know who should review your pull request, please assign it to
that
person, otherwise the pull request would get assigned randomly.
If your pull request is for a specific language, please add the
appropriate
lang label.
-->
Makes the changes necessary to run the new PSM interop framework on
Ubuntu 22.04:
- Install dependencies via apt: `kubectl`,
`google-cloud-sdk-gke-gcloud-auth-plugin` (previously were
pre-provisioned or available as part of gcloud distribution)
- Use venv instead of pyenv
- Use python 3.10 instead of python3 .9
Other changes:
- Do not update gcloud components - the one provisioned is relatively
recent, and expected to be updated as new base images are released
- Unpin pip from `21.0.1`. Not sure if we're OK about using the latest
one `venv --upgrade-deps`, or if we should just pin it to something more
recent (currently it resolves to `pip 22.0.2` and `setuptools 68.0.0`)
ref b/274944592, cl/547690787
`cmake_ninja_vs2019` and `default` are using the same
`cmake_ninja_vs2019` so having two tests are waste so this is removing
`cmake_ninja_vs2019` leaving `default` which does `cmake_ninja_vs2019`.
This change can cut the space consumption by half and with 250GB disc,
- Pre-test: 267,770,322,944 bytes free
- Post-test: 134,499,295,232 bytes free
This test mode tries to create threads wherever it legally can to
maximize the chances of TSAN finding errors in our codebase.
<!--
If you know who should review your pull request, please assign it to
that
person, otherwise the pull request would get assigned randomly.
If your pull request is for a specific language, please add the
appropriate
lang label.
-->
---------
Co-authored-by: ctiller <ctiller@users.noreply.github.com>
<!--
If you know who should review your pull request, please assign it to
that
person, otherwise the pull request would get assigned randomly.
If your pull request is for a specific language, please add the
appropriate
lang label.
-->
---------
Co-authored-by: Yash Tibrewal <yashkt@google.com>
Co-authored-by: Stanley Cheung <stanleycheung@google.com>
Co-authored-by: AJ Heller <hork@google.com>
Co-authored-by: Yijie Ma <yijiem.main@gmail.com>
Co-authored-by: apolcyn <apolcyn@google.com>
Co-authored-by: Jan Tattermusch <jtattermusch@google.com>
Resolve `TESTING_VERSION` to `dev-VERSION` when the job is initiated by
a user, and not the CI. Override this behavior with setting
`FORCE_TESTING_VERSION`.
This solves the problem with the manual job runs executed against a WIP
branch (f.e. a PR) overriding the tag of the CI-built image we use for
daily testing.
The `dev` and `dev-VERSION` "magic" values supported by the
`--testing_version` flag:
- `dev` and `dev-master` and treated as `master`: all
`config.version_gte` checks resolve to `True`.
- `dev-VERSION` is treated as `VERSION`: `dev-v1.55.x` is treated as
simply `v1.55.x`. We do this so that when manually running jobs for old
branches the feature skip check still works, and unsupported tests are
skipped.
This changes will take care of all langs/branches, no backports needed.
ref b/256845629
The job run time was creeping to the 2h timeout. Let's bump it to 3h.
Note that this is `master` branch, so it also includes the build time
every time we commit to grpc/grpc.
ref b/280784903
<!--
If you know who should review your pull request, please assign it to
that
person, otherwise the pull request would get assigned randomly.
If your pull request is for a specific language, please add the
appropriate
lang label.
-->
<!--
If you know who should review your pull request, please assign it to
that
person, otherwise the pull request would get assigned randomly.
If your pull request is for a specific language, please add the
appropriate
lang label.
-->
Oops I missed important changes from
https://github.com/grpc/grpc/pull/32712. And it turned out that there
are two problems that I couldn't fix at this point.
- Windows Bazel RBE Linker Error: This may be caused by how new Bazel 6
invokes build tools chain but it's not clear. I put workaround to use
Bazel 5 by using `OVERRIDE_BAZEL_VERSION=5.4.1`
- Rule `rules_pods` to fetch CronetFramework from CocoaPod has
incompatibility with sort of built-in apple toolchain.
(https://github.com/bazel-xcode/PodToBUILD/issues/232): I couldn't find
a workaround to fix this so I ended up disabling all tests depending
this target.
It already started hitting the limit resulting in continuous failure.
https://github.com/grpc/grpc/pull/32603 is believed to contribute to
this time increase but let's bump it first and visit this issue later.