This allows us to replace `absl::optional<TaskHandle>` with checks
against the invalid handle.
This PR also replaces the differently-named invalid handle instances
with a uniform way of accessing static invalid instances across all
handle types, which aids a bit in testing.
Add an event manager that spawns threads just as much as it possibly
can... to expose TSAN to the myriad thread ordering problems in our code
base.
Next steps for this will be to add a new test mode for tsan + thready
event engine + a few other doodads to increase threads in the system
(party.cc in particular has a good place for a hook).
<!--
If you know who should review your pull request, please assign it to
that
person, otherwise the pull request would get assigned randomly.
If your pull request is for a specific language, please add the
appropriate
lang label.
-->
---------
Co-authored-by: ctiller <ctiller@users.noreply.github.com>
(part of removing support for VS2017)
Also see https://github.com/grpc/grpc/pull/32649
Also see https://github.com/grpc/grpc/pull/32615
The switch to grpc-win2019 windows workers has already happened:
(cl/517400022).
Once this PR lands, I'll backport to 1.53.x branch as well (since that
release removes the VS2017 support).
<!--
If you know who should review your pull request, please assign it to
that
person, otherwise the pull request would get assigned randomly.
If your pull request is for a specific language, please add the
appropriate
lang label.
-->
---------
Co-authored-by: ctiller <ctiller@users.noreply.github.com>
Initial PR to establish a bazel dependency on
https://github.com/google/fuzztest, with which I'm planning on basing a
hardening program.
Casting a relatively wide net with reviewers: I'm genuinely interested
in feedback building up the docs, and general ergonomics of this change.
I've located relevant files in the `fuzztest/...` directory. The tests
only build with the `--config fuzztest` bazel argument for now (because
of needing C++17), so locating them separately keeps `bazel test
test/...` working as it does today. In a few years time, when we adopt
C++17, we'll be able to rationalize the test directories a little bit.
We'll need to add some kokoro jobs (maybe with this PR?) to execute the
relevant tests.
<!--
If you know who should review your pull request, please assign it to
that
person, otherwise the pull request would get assigned randomly.
If your pull request is for a specific language, please add the
appropriate
lang label.
-->
---------
Co-authored-by: ctiller <ctiller@users.noreply.github.com>
This was missed from https://github.com/grpc/grpc/pull/32596, resulting
in k8s teardown exiting early with:
```
kubernetes.client.exceptions.ApiException: (404)
Reason: Not Found
HTTP response headers: OMITTED
HTTP response body: {"kind":"Status","apiVersion":"v1","metadata":{},"status":"Failure","message":"deployments.apps \"psm-grpc-client\" not found","reason":"NotFound","details":{"name":"psm-grpc-client","group":"apps","kind":"deployments"},"code":404}
```
With this change, teardown resumes with:
> `k8s_base_runner.py:282] Deployment psm-grpc-client deletion failed:
Kubernetes API returned 404 Not Found: deployments.apps
"psm-grpc-client" not found`
Fix incompatibilities identified when running adhoc runs on the new
custom win2019 image.
After merging this, it should be possible to switch to the new image
without breaking any tests.
- for most fixes I added a comment that explains why they're necessary.
- the new image won't have VS2015 installed, so I'm switching the protoc
artifact build to VS2017
This PR will need to be backported to older release branches to ensure
the windows tests continue working on those branches as well (IMHO I
haven't made any changes that would be difficult to backport and I tried
to keeps the diff as small as possible to avoid issues when
backporting).
After we switch to the new image (and all the windows tests are green),
we can incrementally move the builds that are still using VS2017 to
VS2019.
This is a big rewrite of global config.
It does a few things, all somewhat intertwined:
1. centralize the list of configuration we have to a yaml file that can
be parsed, and code generated from it
2. add an initialization and a reset stage so that config vars can be
centrally accessed very quickly without the need for caching them
3. makes the syntax more C++ like (less macros!)
4. (optionally) adds absl flags to the OSS build
This first round of changes is intended to keep the system where it is
without major changes. We pick up absl flags to match internal code and
remove one point of deviation - but importantly continue to read from
the environment variables. In doing so we don't force absl flags on our
customers - it's possible to configure grpc without the flags - but
instead allow users that do use absl flags to configure grpc using that
mechanism. Importantly this lets internal customers configure grpc the
same everywhere.
Future changes along this path will be two-fold:
1. Move documentation generation into the code generation step, so that
within the source of truth yaml file we can find all documentation and
data about a configuration knob - eliminating the chance of forgetting
to document something in all the right places.
2. Provide fuzzing over configurations. Currently most config variables
get stashed in static constants across the codebase. To fuzz over these
we'd need a way to reset those cached values between fuzzing rounds,
something that is terrifically difficult right now, but with these
changes should simply be a reset on `ConfigVars`.
<!--
If you know who should review your pull request, please assign it to
that
person, otherwise the pull request would get assigned randomly.
If your pull request is for a specific language, please add the
appropriate
lang label.
-->
---------
Co-authored-by: ctiller <ctiller@users.noreply.github.com>
<!--
If you know who should review your pull request, please assign it to
that
person, otherwise the pull request would get assigned randomly.
If your pull request is for a specific language, please add the
appropriate
lang label.
-->
---------
Co-authored-by: ctiller <ctiller@users.noreply.github.com>
- Increase kubernetes library default for urlib3 retries to 10
- Add custom retry logic to all API calls made by framework.k8s
Custom retry logic handles various errors we're experienced over
two years, and based on ~140 failure reports:
1. Errors returned by the k8s API server itself:
- 401 Unauthorized
- 409 Conflict
- 429 Too Many Requests
- 500 Internal Server Error
2. Connection errors that might indicate k8s API server is temporarily
unavailable (such as a restart, upgrade, etc):
- All `NewConnectionError`s, f.e. "Connection timed out",
"Connection refused"
- All "connection aborted" `ProtocolError`s, f.e. "Remote end
closed connection without response", "Connection reset by peer"
ref b/178378578, b/258546394
PSM Interop: Local dev various improvements
- Cleanup resources on ctrl+c
- Add startup probes to address the issue with port forwarding starting
before the workload listens on a port
- Remove misleading restartPolicy: it's silently ignored by k8s
- Extra debug message with port-forwarding command
<!--
If you know who should review your pull request, please assign it to
that
person, otherwise the pull request would get assigned randomly.
If your pull request is for a specific language, please add the
appropriate
lang label.
-->
PR #32215 added the verified root cert subject to the lower level
`tsi_peer`. This PR is a companion to that and completes the feature by
bubbling the information up to the `TsiCustomVerificationCheckRequest`
which is part of the user facing API for implementing custom
verification callbacks.
There are potentially surprising deployment bugs that can cause `EMFILE`
to be hit. For example, file descriptor limits can be easily reached if
- the round robin LB policy is used
- the load balancer hands out an assignment with a lot of backends
- using debian's default 1024 file descriptor limit.
To make such problems more apparent, we can pay special attention to
this error and log ERROR when it happens.
Related: b/265199104
This test has been occasionally failing on CI with "Bus Error" crashes
while requiring the grpc shared library.
These crashes have been unreproducible locally. Let's continue debugging
(b/266212253) but skip this on CI.
Upgrade boringssl to the latest "master-with-bazel"
- use the `'USE_HEADERMAP' => 'NO'` fix for ObjC
- update the key for asm optimizations on mac/apple in python's setup.py
This PR depends on monterey fixes here:
https://github.com/grpc/grpc/pull/32493 and the boringssl's build
simplification
https://boringssl-review.googlesource.com/c/boringssl/+/56465.
---------
Co-authored-by: Hannah Shi <hannahshisfb@gmail.com>
Make remaining objC jobs compatible with kokoro monterey workers and
prepare for boringssl upgrade.
The changes here are taken from https://github.com/grpc/grpc/pull/32357,
but they should be merged in a separate PR
(we need the changes to be able to upgrade to monterey anyway and
there's no reason to make the boringssl upgrade PR more complicated by
bundling more fixes into it).
I've checked that the grpc_basictests_objc_examples and
grpc_ios_binary_size are green if switched to monterey.
Unfortunately it's hard to make grpc_basictests_objc_examples pass on
both monterey and mojave, so I suggest merging this PR at the same time
as CL to upgrade the kokoro jobs to monterey.
- that way both PR and continuous runs will remain green
- older branches would need a backport anyway
---------
Co-authored-by: Hannah Shi <hannahshisfb@gmail.com>
Looks like this was accidentally dropped from our build files in
https://github.com/grpc/grpc/pull/21929, which means that this test
hasn't actually been built or run in almost 3 years. Unsurprisingly
after all that time, I had to make some changes to the test to get it to
actually build.
This version broke backward compatibility in `plugin_pb2.py`, which is
presumably a relatively minor regression, since we have not yet heard
any complaints about it. This PR:
- Excludes `4.22.0` from installation
- _Includes_ protobuf pre-releases into testing so this can be caught
more quickly in the future.
When bad prereleases are caught, we can exclude them from testing in a
similar manner to this PR. We may eventually want to invest into a
system where we can define these bad versions centrally.
<!--
If you know who should review your pull request, please assign it to
that
person, otherwise the pull request would get assigned randomly.
If your pull request is for a specific language, please add the
appropriate
lang label.
-->
---------
Co-authored-by: ctiller <ctiller@users.noreply.github.com>
Cleanup and remove ios cpp test cronet
To test manually:
./tools/bazel test //src/objective-c/tests:CppCronetTests
@sampajano
<!--
If you know who should review your pull request, please assign it to
that
person, otherwise the pull request would get assigned randomly.
If your pull request is for a specific language, please add the
appropriate
lang label.
-->
<!--
If you know who should review your pull request, please assign it to
that
person, otherwise the pull request would get assigned randomly.
If your pull request is for a specific language, please add the
appropriate
lang label.
-->
---------
Co-authored-by: ctiller <ctiller@users.noreply.github.com>
Fixes the banned function checker to include header files. Some things
had slipped through, such as `absl::make_unique`
033d55ffd3/tools/run_tests/sanity/core_banned_functions.py (L64-L65)
<!--
If you know who should review your pull request, please assign it to
that
person, otherwise the pull request would get assigned randomly.
If your pull request is for a specific language, please add the
appropriate
lang label.
-->
---------
Co-authored-by: drfloob <drfloob@users.noreply.github.com>
* [http] Dont drop connections on metadata limit exceeded
* remove bad test
* Automated change: Fix sanity tests
---------
Co-authored-by: ctiller <ctiller@users.noreply.github.com>
* Support python 3.11 on aarch64
Fixes https://github.com/grpc/grpc/issues/30927
* Change base tag to something more specific
* Update current version
---------
Co-authored-by: Richard Belleville <rbellevi@google.com>
Our test clusters are in the broken/dirty state (except the URL map's
"basic" cluster, which isn't affected).
This PR switches to the newly created clusters to:
1. Get a data point on whether newly created clusters are affected
by the same issues.
2. Allow for descriptive work on the old clusters.
3. Hopefully, bring our tests back to green.
Bonus: more sensible cluster names.