Fixes internal b/297030898.
This ought not be necessary, but getting context from the promise based
filter wrapper code at call destruction is unwieldy.
Hacking around it here and fixing it properly post client rollout seems
wiser.
This turns on the `work_stealing` experiment everywhere, enabling the
work-stealing thread pool in Posix & Windows EventEngine
implementations. This only really has an effect when EventEngine is used
for I/O (currently flag-guarded).
* Before change:
* Test hangs
[here](https://github.com/grpc/grpc/blob/master/src/python/grpcio/grpc/_cython/_cygrpc/aio/call.pyx.pxi#L250)(Called
from
[here](3b23fe62ca/src/python/grpcio/grpc/aio/_call.py (L261)))
waiting for status.
* After change, we'll manually set status. We're also printing traceback
like the following:
```
Traceback (most recent call last):
File "/usr/local/google/home/xuanwn/.cache/bazel/_bazel_xuanwn/da3828576aa39e99a5c826cc2e2e22fb/sandbox/linux-sandbox/1576/execroot/com_github_grpc_grpc/bazel-out/k8-fastbuild/bin/src/python/grpcio_tests/tests_aio/unit/metadata_test.runfiles/com_github_grpc_grpc/src/python/grpcio/grpc/aio/_call.py", line 492, in _writ
e
await self._cython_call.send_serialized_message(serialized_request)
File "src/python/grpcio/grpc/_cython/_cygrpc/aio/call.pyx.pxi", line 379, in send_serialized_message
File "src/python/grpcio/grpc/_cython/_cygrpc/aio/callback_common.pyx.pxi", line 163, in _send_message
File "src/python/grpcio/grpc/_cython/_cygrpc/aio/callback_common.pyx.pxi", line 160, in _cython.cygrpc._send_message
File "src/python/grpcio/grpc/_cython/_cygrpc/aio/callback_common.pyx.pxi", line 106, in execute_batch
_cython.cygrpc.ExecuteBatchError: Failed grpc_call_start_batch: 11 with grpc_call_error value: 'GRPC_CALL_ERROR_INVALID_MESSAGE'
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "/usr/local/google/home/xuanwn/.cache/bazel/_bazel_xuanwn/da3828576aa39e99a5c826cc2e2e22fb/sandbox/linux-sandbox/1576/execroot/com_github_grpc_grpc/bazel-out/k8-fastbuild/bin/src/python/grpcio_tests/tests_aio/unit/metadata_test.runfiles/com_github_grpc_grpc/src/python/grpcio_tests/tests_aio/unit/_test_base.py", l
ine 31, in wrapper
return loop.run_until_complete(f(*args, **kwargs))
File "/usr/local/google/home/xuanwn/.pyenv/versions/3.10.9/lib/python3.10/asyncio/base_events.py", line 649, in run_until_complete
return future.result()
File "/usr/local/google/home/xuanwn/.cache/bazel/_bazel_xuanwn/da3828576aa39e99a5c826cc2e2e22fb/sandbox/linux-sandbox/1576/execroot/com_github_grpc_grpc/bazel-out/k8-fastbuild/bin/src/python/grpcio_tests/tests_aio/unit/metadata_test.runfiles/com_github_grpc_grpc/src/python/grpcio_tests/tests_aio/unit/metadata_test.py"
, line 310, in test_stream_unary
await call.write(_REQUEST)
File "/usr/local/google/home/xuanwn/.cache/bazel/_bazel_xuanwn/da3828576aa39e99a5c826cc2e2e22fb/sandbox/linux-sandbox/1576/execroot/com_github_grpc_grpc/bazel-out/k8-fastbuild/bin/src/python/grpcio_tests/tests_aio/unit/metadata_test.runfiles/com_github_grpc_grpc/src/python/grpcio/grpc/aio/_call.py", line 517, in write
await self._write(request)
File "/usr/local/google/home/xuanwn/.cache/bazel/_bazel_xuanwn/da3828576aa39e99a5c826cc2e2e22fb/sandbox/linux-sandbox/1576/execroot/com_github_grpc_grpc/bazel-out/k8-fastbuild/bin/src/python/grpcio_tests/tests_aio/unit/metadata_test.runfiles/com_github_grpc_grpc/src/python/grpcio/grpc/aio/_call.py", line 495, in _writ
e
await self._raise_for_status()
File "/usr/local/google/home/xuanwn/.cache/bazel/_bazel_xuanwn/da3828576aa39e99a5c826cc2e2e22fb/sandbox/linux-sandbox/1576/execroot/com_github_grpc_grpc/bazel-out/k8-fastbuild/bin/src/python/grpcio_tests/tests_aio/unit/metadata_test.runfiles/com_github_grpc_grpc/src/python/grpcio/grpc/aio/_call.py", line 263, in _rais
e_for_status
raise _create_rpc_error(
grpc.aio._call.AioRpcError: <AioRpcError of RPC that terminated with:
status = StatusCode.INTERNAL
details = "Internal error from Core"
debug_error_string = "Failed grpc_call_start_batch: 11 with grpc_call_error value: 'GRPC_CALL_ERROR_INVALID_MESSAGE'"
>
```
<!--
If you know who should review your pull request, please assign it to
that
person, otherwise the pull request would get assigned randomly.
If your pull request is for a specific language, please add the
appropriate
lang label.
-->
This WorkStealingThreadPool change improves the (lagging)
`cpp_protobuf_async_streaming_qps_unconstrained_secure` 32-core
benchmark.
Baseline OriginalThreadPool QPS: 830k
Previous average WorkStealingThreadPool QPS: 755k
New WorkStealingThreadPool average (2 runs) QPS: 850k
Update from gtcooke94:
This PR adds support to build gRPC and it's tests with OpenSSL3. There were some
hiccups with tests as the tests with openssl haven't been built or exercised in a
few months, so they needed some work to fix.
Right now I expect all test files to pass except the following:
- h2_ssl_cert_test
- ssl_transport_security_utils_test
I confirmed locally that these tests fail with OpenSSL 1.1.1 as well,
thus we are at least not introducing regressions. Thus, I've added compiler directives around these tests so they only build when using BoringSSL.
---------
Co-authored-by: Gregory Cooke <gregorycooke@google.com>
Co-authored-by: Esun Kim <veblush@google.com>
- add debug-only `WorkSerializer::IsRunningInWorkSerializer()` method
and use it in client_channel to verify that subchannels are destroyed in
the `WorkSerializer`
- note: this mechanism uses `std:🧵:id`, so I had to exclude
work_serializer.cc from the core_banned_constructs check
- fix `WorkSerializer::Run()` to unref the callback before releasing
ownership of the `WorkSerializer`, so that any refs captured by the
`std::function<>` will be released before releasing ownership
- fix the WRR timer callback to hop into the `WorkSerializer` to release
its ref to the picker, since that transitively releases refs to
subchannels
- fix subchannel connectivity state notifications to unref the watcher
inside the `WorkSerializer`, since the watcher often transitively holds
refs to subchannels
CNR the flake, but I've changed the test (which is very old) to use some
of our more modern helper functions that have saner timeouts.
Also re-add a `return` statement that was accidentally removed in
#33753, which I noticed while working on this. Its absence doesn't cause
a real problem, but it does cause us to needlessly trigger a duplicate
connection attempt or report a duplicate CONNECTING update in some
cases.
Proposed alternative to https://github.com/grpc/grpc/pull/34024.
This version has a simpler, faster busy-count implementation based on a
sharded set of atomic counts: fast increment/decrement operations,
relatively slower summation of total counts (which need to happen much
less frequently).
WRR is showing a very high CPU cost relative to previous solutions, and
it's unclear why this is.
Add two metrics that should help us see the shape of the subchannel sets
that are being passed to high cost systems in order to confirm/deny
theories.
---------
Co-authored-by: ctiller <ctiller@users.noreply.github.com>
De-experiment pick first since we have both affinity and randomness E2E
test running successfully.
---------
Co-authored-by: Yash Tibrewal <yashkt@google.com>
This PR is mainly a set of improvements that allow the C++ Alarm to be
migrated away from legacy iomgr. It cannot be landed without significant
speedup, due to third-parties relying on a fast path for immediate timer
execution with deadlines <= now.
Previous EventEngine performance of bm_alarm, compared to baseline iomgr
timers: *0.014%*
This PR: *2.5%*
Regarding previous failures to land this change: The cloud libraries
team agreed to reduce the amount of stress in their alarm stress test
https://github.com/googleapis/google-cloud-cpp/pull/12378
- Extract build metadata for some external dependencies from bazel
build. This is achieved by letting extract_metadata_from_bazel_xml.py
analyze some external libraries and sources. The logic is basically the
same as for internal libraries, I only needed to teach
extract_metadata_from_bazel_xml.py which external libraries it is
allowed to analyze.
* currently, the list of source files is automatically determined for
`z`, `upb`, `re2` and `gtest` dependencies (at least for the case where
we're building in "embedded" mode - e.g. mostly native extensions for
python, php, ruby etc. - cmake has the ability to replace some of these
dependencies by actual cmake dependency.)
- Eliminate the need for manually written gen_build_yaml.py for some
dependencies.
- Make the info on target dependencies in build_autogenerated.yaml more
accurate and complete. Until now, there were some depdendencies that
were allowed to show up in build_autogenerated.yaml and some that were
being skipped. This made generating the CMakeLists.txt and Makefile
quite confusing (since some dependencies are being explicitly mentioned
and some had to be assumed by the build system).
- Overhaul the Makefile
* the Makefile is currently only used internally (e.g. for ruby and PHP
builds)
* until now, the makefile wasn't really using the info about which
targets depend on what libraries, but it was effectively hardcoding the
depedendency data (by magically "knowing" what is the list of all the
stuff that e.g. "grpc" depends on).
* After the overhaul, the Makefile.template now actually looks at the
library dependencies and uses them when generating the makefile. This
gives a more correct and easier to maintain makefile.
* since csharp is no longer on the master branch, remove all mentions of
"csharp" targets in the Makefile.
Other notable changes:
- make extract_metadata_from_bazel_xml.py capable of resolving workspace
bind() rules (so that it knows the real name of the target that is
referred to as e.g. `//external:xyz`)
TODO:
- [DONE] ~~pkgconfig C++ distribtest~~
- [DONE} ~~update third_party/README to reflect changes in how some deps
get updated now.~~
Planned followups:
- cleanup naming of some targets in build metadata and buildsystem
templates: libssl vs boringssl, ares vs cares etc.
- further cleanup of Makefile
- further cleanup of CMakeLists.txt
- remote the need from manually hardcoding extra metadata for targets in
build_autogenerated.yaml. Either add logic that determines the
properties of targets automatically, or use metadata from bazel BUILD.
Our current implementation of Join, TryJoin leverage some complicated
template stuff to work, which makes them hard to maintain. I've been
thinking about ways to simplify that for some time and had something
like this in mind - using a code generator that's at least a little more
understandable to code generate most of the complexity into a file that
is checkable.
Concurrently - I have a cool optimization in mind - but it requires that
we can move promises after polling, which is a contract change. I'm
going to work through the set of primitives we have in the coming weeks
and change that contract to enable the optimization.
---------
Co-authored-by: ctiller <ctiller@users.noreply.github.com>
<!--
If you know who should review your pull request, please assign it to
that
person, otherwise the pull request would get assigned randomly.
If your pull request is for a specific language, please add the
appropriate
lang label.
-->
Fix sticky-TF behavior such that once we enter TRANSIENT_FAILURE, we do
not leave that state if we get a new address list.
Also, fix handling of subchannels in state TRANSIENT_FAILURE.
Previously, if a subchannel was already in state TRANSIENT_FAILURE when
we wanted to start a connection attempt on it (e.g., because the
subchannel already existed from a different channel, or because it
already existed in the previous subchannel list), we would wait for it
to report IDLE before attempting to connect. This PR changes pick_first
to instead immediately skip the subchannel and move on to the next one.
Now, the only time we wait for a subchannel in TRANSIENT_FAILURE is when
we wrap back around to the first subchannel in the list.
The types `google::api::expr::v1alpha1` are available in
`"@com_google_googleapis//google/api/expr/v1alpha1:expr_proto"` and not
`"google_type_expr_upb"`
<!--
If you know who should review your pull request, please assign it to
that
person, otherwise the pull request would get assigned randomly.
If your pull request is for a specific language, please add the
appropriate
lang label.
-->
Port https://github.com/grpc/grpc/pull/33871 to EE's
GrpcPolledFdFactoryPosix.
<!--
If you know who should review your pull request, please assign it to
that
person, otherwise the pull request would get assigned randomly.
If your pull request is for a specific language, please add the
appropriate
lang label.
-->
<!--
If you know who should review your pull request, please assign it to
that
person, otherwise the pull request would get assigned randomly.
If your pull request is for a specific language, please add the
appropriate
lang label.
-->
Another case where we need to raise the 'failed before receive
completed' flag prior to actually failing -- I think once promises are
all rolled out I'd like to consider ways to remove this flag.
As per the title.
<!--
Your pull request will be routed to the following person by default for
triaging.
If you know who should review your pull request, please remove the
mentioning below.
-->
@donnadionne
Our current implementation of `Seq`, `TrySeq` leverage some complicated
template stuff to work, which makes them hard to maintain. I've been
thinking about ways to simplify that for some time and had something
like this in mind - using a code generator that's at least a little more
understandable to code generate most of the complexity into a file that
is checkable.
Concurrently - I have a cool optimization in mind - but it requires that
we can move promises after polling, which is a contract change. I'm
going to work through the set of primitives we have in the coming weeks
and change that contract to enable the optimization.
---------
Co-authored-by: ctiller <ctiller@users.noreply.github.com>
Intercepted call class expect a timeout as parameters but a deadline is
provided instead. Only the UnaryUnary class is correctly created with a
timeout while the others remains with the deadlines. This commit fix the
instanciation of the other ones.
<!--
If you know who should review your pull request, please assign it to
that
person, otherwise the pull request would get assigned randomly.
If your pull request is for a specific language, please add the
appropriate
lang label.
-->
We decided to not populate `policy_name` with the HTTP filter name in
xDS case. So removing it from `GenerateServiceConfig`. This will be
consistent across languages. The gRFC
[PR](https://github.com/grpc/proposal/pull/346) has been updated.