Roll forward #34191, which is reverted due to error `2023-10-09
22:01:18,569 FAILED: cmake/build/client_transport_test
--gtest_filter=ClientTransportTest.AddMultipleStreams
GRPC_POLL_STRATEGY=none` (Removed uses_event_engine=False,
uses_polling=False in test build).
<!--
If you know who should review your pull request, please assign it to
that
person, otherwise the pull request would get assigned randomly.
If your pull request is for a specific language, please add the
appropriate
lang label.
-->
Some user reported an issue that symbols from CoreFoundation are missing
when install grpc/grpc-tools inside Conda:
* https://github.com/grpc/grpc/issues/33714
* https://github.com/grpc/grpc/issues/34135
This PR adds the `-framework CoreFoundation` flag so no workarounds are
required from user.
<!--
If you know who should review your pull request, please assign it to
that
person, otherwise the pull request would get assigned randomly.
If your pull request is for a specific language, please add the
appropriate
lang label.
-->
Since we added support for Python 3.12, our tests have been taking
longer to complete and we've been seeing an increase in timeout errors.
We already increased timeout in #34550, this PR increase timeout for
release and pull_request jobs.
<!--
If you know who should review your pull request, please assign it to
that
person, otherwise the pull request would get assigned randomly.
If your pull request is for a specific language, please add the
appropriate
lang label.
-->
Reverts grpc/grpc#34667
The change was reverted because it failed to import to g3, after some
changes, now it's safe to reapply those changes.
Tested by importing this PR internally, it passed presubmit:
cl/573836270
Partial rollback of https://github.com/grpc/grpc/pull/34473 for RBE as
RBE needs Ubuntu 20.04 to get clang artifacts which Google publishes.
There aren't artifacts for Ubuntu 22.04 yet so there is no point of
upgrading RBE image to 22.04.
Ditch the old priority scheme for ordering filters, instead explicitly
mark up before/after constraints.
---------
Co-authored-by: ctiller <ctiller@users.noreply.github.com>
Introduce `grpc-native-debug` gems containing debug symbol packages that
complement the shared libraries shipped in pre-compiled binary gems.
src/ruby/nativedebug/README.md has details on usage.
The basic APIs for the CRL Reloading features.
This adds external types to represent CRL Providers, CRLs, and
CertificateInfo.
Internally we will use `CrlImpl` - this layer is needed to hide OpenSSL
details from the user.
GRFC - https://github.com/grpc/proposal/pull/382
Things Done
* Add external API for `CrlProvider`, `Crl`, `CertInfo` (`CertInfo` is
used during CRL lookup rather than passing the entire certificate).
* Add code paths in `ssl_transport_security` to utilize CRL providers
* Add `StaticCrlProvider`
* Refactor `crl_ssl_transport_security_test.cc` so it is more extensible
and can be used with providers
1. Changes the resource retention period to 2 days for all resources
(previously 7 days for TD resources, 6 hours for k8s). This solved a
problem with k8s resources being stuck because corresponding TD
resources weren't deleted.
2. Resume on namespace cleanup failures
3. Add secondary lb cluster cleanup logic
4. Modularize `grpc_xds_resource_cleanup.sh`
5. Make `KubernetesNamespace`'s methods `pretty_format_status` and
`pretty_format_metadata` public
6. `pretty_format_status`: also print resource kind, creation and
deletion requested dates
ref b/259724370, cl/517235715
We're seeing too many debug headers, change it to only log header in
case of error.
<!--
If you know who should review your pull request, please assign it to
that
person, otherwise the pull request would get assigned randomly.
If your pull request is for a specific language, please add the
appropriate
lang label.
-->
Instead of fixing a target size for writes, try to adapt it a little to
observed bandwidth.
The initial algorithm tries to get large writes within 100-1000ms
maximum delay - this range probably wants to be tuned, but let's see.
The hope here is that on slow connections we can not back buffer so much
and so when we need to send a ping-ack it's possible without great
delay.
Split nonbazel test into two; one for bazel build which will be labeled
as required and the other for remining nonbazel tests.
Corresponding internal CL: cl/572652950
<!--
If you know who should review your pull request, please assign it to
that
person, otherwise the pull request would get assigned randomly.
If your pull request is for a specific language, please add the
appropriate
lang label.
-->
We're not running any test at all from `run_test.py` because of the way
we filter test cases:
1d136fd05f/src/python/grpcio_tests/tests/_runner.py (L137)
* `testcase_filter` is read from a json file (like [this
one](https://github.com/grpc/grpc/blob/master/src/python/grpcio_tests/tests/tests.json))
and test name is similar to `unit._metadata_test.MetadataTest`.
* `case.id()` is loaded by `iterate_suite_cases` and will always have a
prefix of `tests`, an example of case id will be:
`tests.unit._metadata_test.MetadataTest`.
Because of the prefix, none of the test case will be matched thus we're
not running any of the tests.
This PR fixes the prefix issue and all the regressions comes from not
running tests using `run_test.py`.
#### Other Changes
* Added couple of `__init__.py` file since it's required to load tests.
* Added `py_status_code` to Aio rpc state.
* `code()` is expecting to return a python gRPC code but current
`status_code` is a Cython code.
* Added `libsqlite3-dev` to our dockers because it's required for
`coverage==7.2.0`.
* Renamed csds and admin test because test case file have to end with
`_test`:
1d136fd05f/src/python/grpcio_tests/tests/_loader.py (L26)
* Removed gevent test from `run_test.py` because Bazel gevent tests
should be good enough for us.
<!--
If you know who should review your pull request, please assign it to
that
person, otherwise the pull request would get assigned randomly.
If your pull request is for a specific language, please add the
appropriate
lang label.
-->
The `TickForDuration()` method was using `grpc_core::Timestamp::Now()`
to get the current time, but that was not in sync with the `now_` value
inside the Fuzzing EE itself, with the result that after two subsequent
250ms increments, timers were not being properly fired. I've added a
test that demonstrates this failure without the fix.
Experiment 1: On RST_STREAM: reduce MAX_CONCURRENT_STREAMS for one round
trip.
Experiment 2: If a settings frame is outstanding with a lower
MAX_CONCURRENT_STREAMS than is configured, and we receive a new incoming
stream that would exceed the new cap, randomly reject it.
---------
Co-authored-by: ctiller <ctiller@users.noreply.github.com>
This is the initial change of chaotic-good client transport read path,
which is a following PR of the client transport write path at #33876.
There's a pending work of handling endpoint failures in the transport.
It will be added after we have the inter-activity pipe with close
function.
<!--
If you know who should review your pull request, please assign it to
that
person, otherwise the pull request would get assigned randomly.
If your pull request is for a specific language, please add the
appropriate
lang label.
-->
- Fixes support for the same address being present more than once in the
address list, which was accidentally broken in #34244.
- Change the call attribute to encode the hash as an integer instead of
a string.
(I had to first solve a problem with "dubious file ownership" error that
was happening inside the grpc_repo_archive action when using the
concurrent wrapper).
Example:
```
tools/docker_runners/examples/concurrent_bazel.sh --bazelrc=tools/remote_build/linux.bazelrc test --genrule_strategy=remote,local --workspace_status_command=tools/bazelify_tests/workspace_status_cmd.sh //tools/bazelify_tests/test:runtests_csharp_linux_dbg
```
We shouldn't just set `termination_grace_period_seconds=600` by default
for all gamma tests extending `GammaXdsKubernetesTestCase`.
This is what's causing the deployment deletion issue:
> `framework.helpers.retryers.RetryError: Retry error calling
framework.xds_k8s_testcase.IsolatedXdsKubernetesTestCase.cleanup: 1
attempts exhausted. Last exception: RetryError: Retry error calling
framework.infrastructure.k8s.KubernetesNamespace.get_deployment: timeout
0:05:00 (h:mm:ss) exceeded. Check result callback returned False.`
We wait for 5 minutes, while the deployment is happily handing for 10.
Then the second cleanup retry kills it - but not before waiting for
another 5 minutes.
I think `self.force = False` may be solving another issue triggered by
the get_deployment retry timeout: because we start over deleting the
resources by name and some of them are deleted from the first attempt we
get 404. And I'm pretty sure we don't do error-handling correctly when
deleting CRD-based resources - which cascades into even more unnecessary
retries.
- GAMMA server runner: increase the wait time for the NEG annotation
from 1 minute to 3.
- Improve the wording around the wait for NEG methods to make it clear
this is the annotation we're waiting for - so it's not confused with
getting the NEG health from the GCP APIs.
ref b/298501683, b/302723651
We originally wanted a common format with internal OWNERS files on the
thought that we'd use them and import them and oh gosh that was the
wrong thing to think.
With upcoming changes to tree management having them here is going to
cause problems, so lets just update the CODEOWNERS files directly.
Foundation for being able to bazelify the build artifact -> build
package -> distribtest workflow tests.
Main ideas:
- "build artifact" and "build packages" will be represented by a custom
genrule (that runs the build on RBE under a docker container).
- since genrule doesn't support displaying logs for each target as a
separate "target log" (in the same way that bazel tests do), and we
generally want readable per-target logs for the bazelified test, a pair
of targets will be created for each "build artifact task":
- a genrule that actually performs the build, creates an archive with
artifacts and stores the exitcode and build log as rule outputs
- a corresponding "build_test" sh_test that simply looks at the result
of the genrule and presents the build log and build result a "target
log" for this test.