The previous hack had bitrotted and was just returning 3.7. This method
was given to us by the GCF team and has backward compatibility
guarantees.
This will also help us to ensure that we don't accidentally remove
support for a particular Python runtime version before GCF does.
We shouldn't just set `termination_grace_period_seconds=600` by default
for all gamma tests extending `GammaXdsKubernetesTestCase`.
This is what's causing the deployment deletion issue:
> `framework.helpers.retryers.RetryError: Retry error calling
framework.xds_k8s_testcase.IsolatedXdsKubernetesTestCase.cleanup: 1
attempts exhausted. Last exception: RetryError: Retry error calling
framework.infrastructure.k8s.KubernetesNamespace.get_deployment: timeout
0:05:00 (h:mm:ss) exceeded. Check result callback returned False.`
We wait for 5 minutes, while the deployment is happily handing for 10.
Then the second cleanup retry kills it - but not before waiting for
another 5 minutes.
I think `self.force = False` may be solving another issue triggered by
the get_deployment retry timeout: because we start over deleting the
resources by name and some of them are deleted from the first attempt we
get 404. And I'm pretty sure we don't do error-handling correctly when
deleting CRD-based resources - which cascades into even more unnecessary
retries.
- GAMMA server runner: increase the wait time for the NEG annotation
from 1 minute to 3.
- Improve the wording around the wait for NEG methods to make it clear
this is the annotation we're waiting for - so it's not confused with
getting the NEG health from the GCP APIs.
ref b/298501683, b/302723651
#34274 is still seeing some headwinds against landing it (I continue to
believe it's the correct fix however).
I'm putting this out as a potential short term mitigation for the load
effects we were seeing, so we can try with folks experiencing problems
and see if it unblocks them.
---------
Co-authored-by: ctiller <ctiller@users.noreply.github.com>
We originally wanted a common format with internal OWNERS files on the
thought that we'd use them and import them and oh gosh that was the
wrong thing to think.
With upcoming changes to tree management having them here is going to
cause problems, so lets just update the CODEOWNERS files directly.
<!--
If you know who should review your pull request, please assign it to
that
person, otherwise the pull request would get assigned randomly.
If your pull request is for a specific language, please add the
appropriate
lang label.
-->
Foundation for being able to bazelify the build artifact -> build
package -> distribtest workflow tests.
Main ideas:
- "build artifact" and "build packages" will be represented by a custom
genrule (that runs the build on RBE under a docker container).
- since genrule doesn't support displaying logs for each target as a
separate "target log" (in the same way that bazel tests do), and we
generally want readable per-target logs for the bazelified test, a pair
of targets will be created for each "build artifact task":
- a genrule that actually performs the build, creates an archive with
artifacts and stores the exitcode and build log as rule outputs
- a corresponding "build_test" sh_test that simply looks at the result
of the genrule and presents the build log and build result a "target
log" for this test.
Previously it turns out it was not safe to run grpc_init in a filter
test - we'd end up mixing event engine implementations, and causing
undefined behavior at grpc_shutdown.
This change makes it safe and fixes a test internally that's flaking at
70% right now (b/302986486).
---------
Co-authored-by: ctiller <ctiller@users.noreply.github.com>
Fix chttp2 too_many_pings test to use only one of IPv4 or IPv6,
depending on test environment.
Also fix dumb reversed conditional bug in some other tests that was
accidentally introduced in #34426.
Looks like we've got a thread race on shutdown with some of these
tests... adding a barrier at the head of tests that require precise
transport counts in order to stabilize.
To avoid depending on transitive includes, specially avoid relying on
transitive include status.h -> str_cat.h that is removed in the latest
version of abseil
Isolate ping callback tracking to its own file.
Also takes the opportunity to simplify keepalive code by applying the
ping timeout to all pings.
Adds an experiment to allow multiple pings outstanding too (this was
originally an accidental behavior change of the work, but one that I
think may be useful going forward).
---------
Co-authored-by: ctiller <ctiller@users.noreply.github.com>
<!--
If you know who should review your pull request, please assign it to
that
person, otherwise the pull request would get assigned randomly.
If your pull request is for a specific language, please add the
appropriate
lang label.
-->
<!--
If you know who should review your pull request, please assign it to
that
person, otherwise the pull request would get assigned randomly.
If your pull request is for a specific language, please add the
appropriate
lang label.
-->
More changes as part of the dualstack design:
- Change resolver and LB policy APIs to support multiple addresses per
endpoint. Specifically, replace `ServerAddress` with
`EndpointAddresses`, which encodes more than one address. Per-address
channel args are retained at the same level, so they are now
per-endpoint. For now, `EndpointAddress` provides a single-address ctor
and a single-address accessor for backward compatibility, so
`ServerAdress` is an alias for `EndpointAddresses`; eventually, this
alias and the single-address methods will be removed.
- Add an `EndpointAddressSet` class, which represents an unordered set
of addresses to be used as a map key. This will be used in a number of
LB policies that need to store per-endpoint state.
- Change the LB policy API's `ChannelControlHelper::CreateSubchannel()`
method to take the address and per-endpoint channel args as separate
parameters, so that we don't need to construct a legacy `ServerAddress`
object as we create a new subchannel for each address in the endpoint.
- Change pick_first to flatten the address list.
- Change ring_hash to use `EndpointAddressSet` as the key for its
endpoint map, and to use the first address of the endpoint as the hash
key.
- Change WRR to use `EndpointAddressSet` as the key for its endpoint
weight map.
Note that support for multiple addresses per endpoint is guarded in RR
by the existing `round_robin_delegate_to_pick_fist` experiment and in
WRR by the existing `wrr_delegate_to_pick_first` experiment.
This PR does *not* include support for multiple addresses per endpoint
for the outlier_detection or xds_override_host LB policies; those will
come in subsequent PRs.
Waiting for an active channel takes less time in non-gamma test suites
because they only start waiting after already waited for the TD backends
to be created and report healthy.
In GAMMA, these resources are created asynchronously by Kubernetes. To
compensate for this, we double the timeout for GAMMA tests.
Really minimal change to make the output buffer for chttp2 be a
`grpc_core::SliceBuffer` so that we can start mixing in the new framer
code.
---------
Co-authored-by: ctiller <ctiller@users.noreply.github.com>
Fix: https://github.com/grpc/grpc/issues/33890
In case of abort, currently we don't log anything, this change added an
`AbortError` as the default error for abort, if any other error
happened, we'll log the error so user will be aware of the other error.
Related gRFC: https://github.com/grpc/proposal/pull/388
### Testing
* Tested locally, after this change, any non-AbortError will be logged,
sample log:
```
ERROR:grpc._server:Exception happened while abort: Other exception happened!
Traceback (most recent call last):
File "/usr/local/google/home/xuanwn/.cache/bazel/_bazel_xuanwn/da3828576aa39e99a5c826cc2e2e22fb/sandbox/linux-sandbox/9/execroot/com_github_grpc_grpc/bazel-out/k8-fastbuild/bin/src/python/grpcio_tests/tests/unit/_abort_test.native.runfiles/com_github_grpc_grpc/src/python/grpcio/grpc/_server.py", line 553, in _call_behavior
response_or_iterator = behavior(argument, context)
File "/usr/local/google/home/xuanwn/.cache/bazel/_bazel_xuanwn/da3828576aa39e99a5c826cc2e2e22fb/sandbox/linux-sandbox/9/execroot/com_github_grpc_grpc/bazel-out/k8-fastbuild/bin/src/python/grpcio_tests/tests/unit/_abort_test.native.runfiles/com_github_grpc_grpc/src/python/grpcio_tests/tests/unit/_abort_test.py", line 95, in abort_with_status_unary_unary_raise_additional_exception
raise Exception("Other exception happened!")
Exception: Other exception happened!
```
<!--
If you know who should review your pull request, please assign it to
that
person, otherwise the pull request would get assigned randomly.
If your pull request is for a specific language, please add the
appropriate
lang label.
-->
This change is to update the TD bootstrap generator for prod tests. This
is part of the TD release process. The new image has already been merged
to staging and tested locally in google3.
cc: @sergiitk PTAL.
<!--
If you know who should review your pull request, please assign it to
that
person, otherwise the pull request would get assigned randomly.
If your pull request is for a specific language, please add the
appropriate
lang label.
-->