Instead of fixing a target size for writes, try to adapt it a little to
observed bandwidth.
The initial algorithm tries to get large writes within 100-1000ms
maximum delay - this range probably wants to be tuned, but let's see.
The hope here is that on slow connections we can not back buffer so much
and so when we need to send a ping-ack it's possible without great
delay.
<!--
If you know who should review your pull request, please assign it to
that
person, otherwise the pull request would get assigned randomly.
If your pull request is for a specific language, please add the
appropriate
lang label.
-->
We're not running any test at all from `run_test.py` because of the way
we filter test cases:
1d136fd05f/src/python/grpcio_tests/tests/_runner.py (L137)
* `testcase_filter` is read from a json file (like [this
one](https://github.com/grpc/grpc/blob/master/src/python/grpcio_tests/tests/tests.json))
and test name is similar to `unit._metadata_test.MetadataTest`.
* `case.id()` is loaded by `iterate_suite_cases` and will always have a
prefix of `tests`, an example of case id will be:
`tests.unit._metadata_test.MetadataTest`.
Because of the prefix, none of the test case will be matched thus we're
not running any of the tests.
This PR fixes the prefix issue and all the regressions comes from not
running tests using `run_test.py`.
#### Other Changes
* Added couple of `__init__.py` file since it's required to load tests.
* Added `py_status_code` to Aio rpc state.
* `code()` is expecting to return a python gRPC code but current
`status_code` is a Cython code.
* Added `libsqlite3-dev` to our dockers because it's required for
`coverage==7.2.0`.
* Renamed csds and admin test because test case file have to end with
`_test`:
1d136fd05f/src/python/grpcio_tests/tests/_loader.py (L26)
* Removed gevent test from `run_test.py` because Bazel gevent tests
should be good enough for us.
<!--
If you know who should review your pull request, please assign it to
that
person, otherwise the pull request would get assigned randomly.
If your pull request is for a specific language, please add the
appropriate
lang label.
-->
The `TickForDuration()` method was using `grpc_core::Timestamp::Now()`
to get the current time, but that was not in sync with the `now_` value
inside the Fuzzing EE itself, with the result that after two subsequent
250ms increments, timers were not being properly fired. I've added a
test that demonstrates this failure without the fix.
Experiment 1: On RST_STREAM: reduce MAX_CONCURRENT_STREAMS for one round
trip.
Experiment 2: If a settings frame is outstanding with a lower
MAX_CONCURRENT_STREAMS than is configured, and we receive a new incoming
stream that would exceed the new cap, randomly reject it.
---------
Co-authored-by: ctiller <ctiller@users.noreply.github.com>
This is the initial change of chaotic-good client transport read path,
which is a following PR of the client transport write path at #33876.
There's a pending work of handling endpoint failures in the transport.
It will be added after we have the inter-activity pipe with close
function.
<!--
If you know who should review your pull request, please assign it to
that
person, otherwise the pull request would get assigned randomly.
If your pull request is for a specific language, please add the
appropriate
lang label.
-->
- Fixes support for the same address being present more than once in the
address list, which was accidentally broken in #34244.
- Change the call attribute to encode the hash as an integer instead of
a string.
We shouldn't just set `termination_grace_period_seconds=600` by default
for all gamma tests extending `GammaXdsKubernetesTestCase`.
This is what's causing the deployment deletion issue:
> `framework.helpers.retryers.RetryError: Retry error calling
framework.xds_k8s_testcase.IsolatedXdsKubernetesTestCase.cleanup: 1
attempts exhausted. Last exception: RetryError: Retry error calling
framework.infrastructure.k8s.KubernetesNamespace.get_deployment: timeout
0:05:00 (h:mm:ss) exceeded. Check result callback returned False.`
We wait for 5 minutes, while the deployment is happily handing for 10.
Then the second cleanup retry kills it - but not before waiting for
another 5 minutes.
I think `self.force = False` may be solving another issue triggered by
the get_deployment retry timeout: because we start over deleting the
resources by name and some of them are deleted from the first attempt we
get 404. And I'm pretty sure we don't do error-handling correctly when
deleting CRD-based resources - which cascades into even more unnecessary
retries.
- GAMMA server runner: increase the wait time for the NEG annotation
from 1 minute to 3.
- Improve the wording around the wait for NEG methods to make it clear
this is the annotation we're waiting for - so it's not confused with
getting the NEG health from the GCP APIs.
ref b/298501683, b/302723651
We originally wanted a common format with internal OWNERS files on the
thought that we'd use them and import them and oh gosh that was the
wrong thing to think.
With upcoming changes to tree management having them here is going to
cause problems, so lets just update the CODEOWNERS files directly.
Isolate ping callback tracking to its own file.
Also takes the opportunity to simplify keepalive code by applying the
ping timeout to all pings.
Adds an experiment to allow multiple pings outstanding too (this was
originally an accidental behavior change of the work, but one that I
think may be useful going forward).
---------
Co-authored-by: ctiller <ctiller@users.noreply.github.com>
Waiting for an active channel takes less time in non-gamma test suites
because they only start waiting after already waited for the TD backends
to be created and report healthy.
In GAMMA, these resources are created asynchronously by Kubernetes. To
compensate for this, we double the timeout for GAMMA tests.
This change is to update the TD bootstrap generator for prod tests. This
is part of the TD release process. The new image has already been merged
to staging and tested locally in google3.
cc: @sergiitk PTAL.
Passed manual run:
https://fusion2.corp.google.com/invocations/0e337438-8573-4b54-ba1c-242b1ad58586
<!--
If you know who should review your pull request, please assign it to
that
person, otherwise the pull request would get assigned randomly.
If your pull request is for a specific language, please add the
appropriate
lang label.
-->
Support Python 3.12.
### Testing
* Passed all Distribution Tests.
* Also tested locally by installing 3.12 artifact.
<!--
If you know who should review your pull request, please assign it to
that
person, otherwise the pull request would get assigned randomly.
If your pull request is for a specific language, please add the
appropriate
lang label.
-->
The deleted code here was overriding the
[intended](866fc41067/tools/run_tests/run_tests.py (L62))
default test env of `GRPC_VERBOSITY=DEBUG`.
I'm just deleting it because it looks like`GRPC_TRACE=api` is not having
any affect anyways, since it relies on `GRPC_VERBOSITY=DEBUG` which it
happens to be unsetting.
Added a separate distribtests for gRPC C++ DLL build on Windows. This
DLL build is a community support so it should be independently run from
the existing Windows distribtests. Actual DLL test will be added.
Working towards testing against CSM Observability. Added ability to
register a prometheus exporter with our Opentelemetry plugin. This will
allow our metrics to be available at the standard prometheus port
`:9464`.
Since many tests now run reliably as bazelified tests on RBE, we can
remove them from presubmit runs
to speedup testing of PRs.
(for now, these jobs will still run on master, they can be removed from
master as a followup).
- linux/grpc_distribtests_standalone is now fully covered by bazel test
suite
a3b4c797a7/tools/bazelify_tests/test/BUILD (L202),
setting them to `presubmit=False` will stop tests from running on PRs.
- stop running tests from grpc_bazel_distribtest on PR, instead rely on
bazel distribtests running as bazelified tests.
This is just an initial scope of tests. Much of this code was written by
@ginayeh . I just did the final polish/integration step.
There are 3 main tests included:
1. The GAMMA baseline test, including the [actual GAMMA
API](https://gateway-api.sigs.k8s.io/geps/gep-1426/) rather than vendor
extensions.
2. Kubernetes-based stateful session affinity tests, where the mesh
(including SSA configuration) is configured using CRDs
3. GCP-based stateful session affinity tests, where the mesh is
configured using the networkservices APIs directly
Tests 1 and 2 will run in both prod and GKE staging, i.e.
`container.googleapis.com` and
`staging-container.sandbox.googleapis.com`. The latter of these will act
as an early detection mechanism for regressions in the controller that
translates Gateway resources into networkservices resources.
Test 3 will run against `staging-networkservices.sandbox.googleapis.com`
to act as an early detection mechanism for regressions in the control
plane SSA implementation.
The scope of the SSA tests is still fairly minimal. Session drain
testing is in-progress but not included in this PR, though several
elements required for it are (grace period, pre-stop hook, and the
ability to kill a single pod in a deployment).
---------
Co-authored-by: Jung-Yu (Gina) Yeh <ginayeh@google.com>
Co-authored-by: Sergii Tkachenko <sergiitk@google.com>
Distribtests is failing with the following error:
```
Collecting twine<=2.0
Downloading twine-2.0.0-py3-none-any.whl (34 kB)
Collecting pkginfo>=1.4.2 (from twine<=2.0)
Downloading pkginfo-1.9.6-py3-none-any.whl (30 kB)
Collecting readme-renderer>=21.0 (from twine<=2.0)
Obtaining dependency information for readme-renderer>=21.0 from 992e0e21b36c98bc06a55e514cb323/readme_renderer-42.0-py3-none-any.whl.metadata
Downloading readme_renderer-42.0-py3-none-any.whl.metadata (2.8 kB)
Collecting requests>=2.20 (from twine<=2.0)
Obtaining dependency information for requests>=2.20 from 0e2d847013cd6965bf26b47bc0bf44/requests-2.31.0-py3-none-any.whl.metadata
Downloading requests-2.31.0-py3-none-any.whl.metadata (4.6 kB)
Collecting requests-toolbelt!=0.9.0,>=0.8.0 (from twine<=2.0)
Downloading requests_toolbelt-1.0.0-py2.py3-none-any.whl (54 kB)
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 54.5/54.5 kB 5.1 MB/s eta 0:00:00
Requirement already satisfied: setuptools>=0.7.0 in ./venv/lib/python3.9/site-packages (from twine<=2.0) (68.2.0)
Collecting tqdm>=4.14 (from twine<=2.0)
Obtaining dependency information for tqdm>=4.14 from f12a80907dc3ae54c5e962cc83037e/tqdm-4.66.1-py3-none-any.whl.metadata
Downloading tqdm-4.66.1-py3-none-any.whl.metadata (57 kB)
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 57.6/57.6 kB 7.6 MB/s eta 0:00:00
Collecting nh3>=0.2.14 (from readme-renderer>=21.0->twine<=2.0)
Downloading nh3-0.2.14.tar.gz (14 kB)
Installing build dependencies: started
Installing build dependencies: finished with status 'done'
Getting requirements to build wheel: started
Getting requirements to build wheel: finished with status 'done'
Preparing metadata (pyproject.toml): started
Preparing metadata (pyproject.toml): finished with status 'error'
error: subprocess-exited-with-error
× Preparing metadata (pyproject.toml) did not run successfully.
│ exit code: 1
╰─> [6 lines of output]
Cargo, the Rust package manager, is not installed or is not on PATH.
This package requires Rust and Cargo to compile extensions. Install it through
the system's package manager or via https://rustup.rs/
Checking for Rust toolchain....
[end of output]
note: This error originates from a subprocess, and is likely not a problem with pip.
error: metadata-generation-failed
× Encountered error while generating package metadata.
╰─> See above for output.
```
### Why
* We're pulling readme_renderer 42.0 from twine, since 42.0 requires nh3
and nh3 requires Rust, the test is failing.
### Fix
* Pinged readme_renderer to `<40.0` since any version higher or equal to
40.0 requires Python 3.8.
### Testing
* Passed manual run: http://sponge/57d815a7-629f-455f-b710-5b80369206cd
<!--
If you know who should review your pull request, please assign it to
that
person, otherwise the pull request would get assigned randomly.
If your pull request is for a specific language, please add the
appropriate
lang label.
-->
* Remove `fetch_build_eggs` since they're deprecated and those deps will
be installed by `setuptools`.
* Fix indentation on `run_test` so we don't miss `native` test cases.
<!--
If you know who should review your pull request, please assign it to
that
person, otherwise the pull request would get assigned randomly.
If your pull request is for a specific language, please add the
appropriate
lang label.
-->
### Background
* `distutils` is deprecated with removal planned for Python 3.12
([pep-0632](https://peps.python.org/pep-0632/)), thus we're trying to
replace all distutils usage with setuptools.
* Please note that user still have access to `distutils` if setuptools
is installed and `SETUPTOOLS_USE_DISTUTILS` is set to `local` (The
default in setuptools, more details can be found [in this
discussion](https://github.com/pypa/setuptools/issues/2806#issuecomment-1193336591)).
### How we decide the replacement
* We're following setuptools [Porting from Distutils
guide](https://setuptools.pypa.io/en/latest/deprecated/distutils-legacy.html#porting-from-distutils)
when deciding the replacement.
#### Replacement not mentioned in the guide
* Replaced `distutils.utils.get_platform()` with
`sysconfig.get_platform()`.
* Based on the [answer
here](https://stackoverflow.com/questions/71664875/what-is-the-replacement-for-distutils-util-get-platform),
and also checked the document that `sysconfig.get_platform()` is good
enough for our use cases.
* Replaced `DistutilsOptionError` with `OptionError`.
* `setuptools.error` is exporting it as `OptionError` [in the
code](https://github.com/pypa/setuptools/blob/v59.6.0/setuptools/errors.py).
* Upgrade `setuptools` in `test_packages.sh` and changed the version
ping to `59.6.0` in `build_artifact_python.bat`.
* `distutils.errors.*` is not fully re-exported until `59.0.0` (See
[this issue](https://github.com/pypa/setuptools/issues/2698) for more
details).
### Changes not included in this PR
* We're patching some compiler related functions provided by distutils
in our code
([example](ee4efc31c1/src/python/grpcio/_spawn_patch.py (L30))),
but since `setuptools` doesn't have similar interface (See [this issue
for more details](https://github.com/pypa/setuptools/issues/2806)), we
don't have a clear path to replace them yet.
<!--
If you know who should review your pull request, please assign it to
that
person, otherwise the pull request would get assigned randomly.
If your pull request is for a specific language, please add the
appropriate
lang label.
-->