Towards https://github.com/grpc/grpc/issues/33032,
Reopen after botched force-push in #33175 that then got "merged" and
cannot be reopened anymore.
More context in that PR.
---------
Co-authored-by: Isuru Fernando <isuruf@gmail.com>
Co-authored-by: David Chamberlin <david.chamberlin@ln.email.gs.com>
Instead of having the per-call code hop into the `WorkSerializer` to
unref the pickers, have the `SubchannelWrapper` itself hop into the
`WorkSerializer` before it is destroyed.
This also reverts the change made to the WRR picker in #34077, since
that is no longer necessary.
@ctiller PTAL for core configuration changes. I converted the type from
std::function to absl::AnyInvocable. Do you think the functor in
RegisteredBuilder should be callable just once or multiple times?
<!--
If you know who should review your pull request, please assign it to
that
person, otherwise the pull request would get assigned randomly.
If your pull request is for a specific language, please add the
appropriate
lang label.
-->
Splitting off from https://github.com/grpc/grpc/pull/34273
<!--
If you know who should review your pull request, please assign it to
that
person, otherwise the pull request would get assigned randomly.
If your pull request is for a specific language, please add the
appropriate
lang label.
-->
In certain situations the current flow control algorithm can result in
sending one flow control update write for every write sent (known
situation: rollout of promise based server calls with qps_test).
Fix things up so that the updates are only sent when truly needed, and
then fix the fallout (turns out our fuzzer had some bugs)
I've placed actual logic changes behind an experiment so that it can be
incrementally & safely rolled out.
Changes -
1) Change local mesh labels to not be reported on 'started' metrics at
all (even those that we know about) to be consistent. (Since xDS labels
atleast on the server side would not be available on started metric.)
2) Add mesh_id as a local label that is populated by reading the xDS
bootstrap. As part of this, also added a minimal xds bootstrap parsing
logic.
<!--
If you know who should review your pull request, please assign it to
that
person, otherwise the pull request would get assigned randomly.
If your pull request is for a specific language, please add the
appropriate
lang label.
-->
<!--
If you know who should review your pull request, please assign it to
that
person, otherwise the pull request would get assigned randomly.
If your pull request is for a specific language, please add the
appropriate
lang label.
-->
* Remove `fetch_build_eggs` since they're deprecated and those deps will
be installed by `setuptools`.
* Fix indentation on `run_test` so we don't miss `native` test cases.
<!--
If you know who should review your pull request, please assign it to
that
person, otherwise the pull request would get assigned randomly.
If your pull request is for a specific language, please add the
appropriate
lang label.
-->
### Background
* `distutils` is deprecated with removal planned for Python 3.12
([pep-0632](https://peps.python.org/pep-0632/)), thus we're trying to
replace all distutils usage with setuptools.
* Please note that user still have access to `distutils` if setuptools
is installed and `SETUPTOOLS_USE_DISTUTILS` is set to `local` (The
default in setuptools, more details can be found [in this
discussion](https://github.com/pypa/setuptools/issues/2806#issuecomment-1193336591)).
### How we decide the replacement
* We're following setuptools [Porting from Distutils
guide](https://setuptools.pypa.io/en/latest/deprecated/distutils-legacy.html#porting-from-distutils)
when deciding the replacement.
#### Replacement not mentioned in the guide
* Replaced `distutils.utils.get_platform()` with
`sysconfig.get_platform()`.
* Based on the [answer
here](https://stackoverflow.com/questions/71664875/what-is-the-replacement-for-distutils-util-get-platform),
and also checked the document that `sysconfig.get_platform()` is good
enough for our use cases.
* Replaced `DistutilsOptionError` with `OptionError`.
* `setuptools.error` is exporting it as `OptionError` [in the
code](https://github.com/pypa/setuptools/blob/v59.6.0/setuptools/errors.py).
* Upgrade `setuptools` in `test_packages.sh` and changed the version
ping to `59.6.0` in `build_artifact_python.bat`.
* `distutils.errors.*` is not fully re-exported until `59.0.0` (See
[this issue](https://github.com/pypa/setuptools/issues/2698) for more
details).
### Changes not included in this PR
* We're patching some compiler related functions provided by distutils
in our code
([example](ee4efc31c1/src/python/grpcio/_spawn_patch.py (L30))),
but since `setuptools` doesn't have similar interface (See [this issue
for more details](https://github.com/pypa/setuptools/issues/2806)), we
don't have a clear path to replace them yet.
<!--
If you know who should review your pull request, please assign it to
that
person, otherwise the pull request would get assigned randomly.
If your pull request is for a specific language, please add the
appropriate
lang label.
-->
This API was [removed in Python
3.12](https://github.com/python/cpython/issues/98040).
Fixes Python 3.12 support in `grpcio` tests.
This is relevant to https://github.com/grpc/grpc/issues/33063.
See also https://github.com/grpc/grpc/pull/33492.
----
I have actually only tested this in a form backported to grpc 1.48.4,
and I am not able to test the change to `bazel/_gevent_test_main.py`
directly. However, the backported form allows me to build grpc 1.48.4
for Fedora Rawhide with Python 3.12, and I believe the version in this
PR to be correct—especially, if CI passes for Python 3.11, I believe
this part of the test code will continue to work in Python 3.12.
We print the status of all experiments at grpc init time, but that
status is in experiment declaration order - which isn't particularly
useful for being able to find if your experiment is enabled or not
quickly.
Change that to lexically sort experiments.
---------
Co-authored-by: ctiller <ctiller@users.noreply.github.com>
This should address one of the failures we're seeing in #34224.
The test failure is caused by the changes in timing triggering a race
condition. In the code at head, we delay sending out the subscription
for the first CDS watch until we've already seen the other two CDS
watches, because the previous send_message op has not yet completed, and
by the time it does, we've seen all 3 watches, so we can send a
subscription for all 3 at the same time. With the WorkSerializer change,
the send_message op is complete by the time we see the first CDS watch,
so we subscribe to only that resource, and then later add the other two.
The result is that we'll NACK twice with two different messages, the
first one including only the error about the first resource, and the
second one including all three.
I suspect this same race condition would have been triggered eventually
by the EventEngine migration anyway; the current test basically depends
on the single-thread timing of the iomgr approach. So I'm addressing it
by replacing the e2e test with a unit test that covers the same cases
without the timing issue.
This test is not giving us good signals while adding some long term
maintenance cost. The way the test is set up and run was not standard.
There are other sanitizer tests to test for memory leaks.
Without this change, calling `find_package(gRPC REQUIRED)` errors out
with
CMake Error at /opt/homebrew/lib/cmake/grpc/gRPCConfig.cmake:8
(find_dependency):
Unknown CMake command "find_dependency".
The issue is that `find_dependency` is provided by the
`CMakeFindDependencyMacro` module[^1], so we need to `include` it before
use.
[^1]:
https://cmake.org/cmake/help/v3.26/module/CMakeFindDependencyMacro.html
Rolls forward part of the dualstack changes, mostly from #33427 and a
little bit from #32692, both of which were reverted in #33718.
Specifically:
- For petiole policies, unconditionally start health watch on
subchannels, even if client side health checking is not enabled; in this
case, the health watch will report the subchannel's raw connectivity
state.
- Fix edge cases in health check reporting that occur when a watcher is
started before the initial state is reported.
- When client-side health checking fails, add the subchannel's address
to the RPC failure status message.
- Outlier detection now works only via the health checking watch, not
via the raw connectivity state watch.
- Remove now-unnecessary hack to ensure that outlier detection does not
work for pick_first.
This reverts https://github.com/grpc/grpc/pull/34170. We (protobuf) have
decided not to release the plugin feature that would have required this
without an actual use-case asking for it. Since this is *technically* a
breaking change, it's better to roll it back in gRPC until we actually
need it.
- Update to windows RBE docker image, followed by subsequent regenerate
of win RBE toolchain (basically verifying that the README.md for windows
RBE toolchain works well)
- Also move windows docker image from GCR to GAR (which is where our
other testing images already live).
- (cleanup) Move windows RBE docker image to a better location in our
repo.
This PR should be sufficient to demonstrate how we would go about
upgrading the windows RBE build to VS2022 once needed.
- Upgrade windows RBE builds to bazel 6.3.2 (supersedes
https://github.com/grpc/grpc/pull/33880). To be able to do that, the RBE
toolchain needed to be regenerated and
- Also added "--dynamic_mode=off" option for windows build since it
seemed necessary for bazel 6.x builds to pass.
- Wrote instructions for generating windows RBE toolchain using the
`rbe_configs_gen` tool (the original windows RBE toolchain was out of
data and also it was generated by a custom script from
go/rbe-windows-user-guide - using a standard tool is better)
- Wrote instructions for rebuilding the windows RBE docker image.
This addresses the problem where windows RBE is stuck on bazel 5.x
(unlike the rest of the repository) and also documents the steps for
making changes to the RBE docker image (e.g. upgrading the visual studio
version used by RBE).
Since we were planning on adding testing for VS2022 (for which we don't
have any test ATM), this will definitely come handy. With the
documentation the process should now be relatively straightforward.
A new metadata type `x-envoy-peer-metadata` is being introduced. We
don't have a better way to do this at the moment compared to just adding
it in `metadata_batch.h`.
The GSM Observability plugin uses this metadata to send topology
information to peers in the form of serialized and base64 encoded
`google::protobuf::Struct`. The individual keys being used inside the
struct are subject to change.
<!--
If you know who should review your pull request, please assign it to
that
person, otherwise the pull request would get assigned randomly.
If your pull request is for a specific language, please add the
appropriate
lang label.
-->
This rolls forward only the pick_first changes from #32692, which were
rolled back in #33718. Specifically:
- Changes PF to use its own subchannel list implementation instead of
using the subchannel_list library, since the latter will be going away
with the dualstack changes.
- As a result of no longer using the subchannel_list library, PF no
longer needs to set the `GRPC_ARG_INHIBIT_HEALTH_CHECKING` channel arg.
- Adds an option to start a health watch on the chosen subchannel, to be
used in the future when pick_first is the child of a petiole policy.
(Currently, this code is not actually called anywhere.)
Based on updates at https://github.com/grpc/proposal/pull/380
<!--
If you know who should review your pull request, please assign it to
that
person, otherwise the pull request would get assigned randomly.
If your pull request is for a specific language, please add the
appropriate
lang label.
-->