PanCakes to the rescue!
We noticed that our 'sanity' test was going to fail, but we think we can
fix that automatically, so we put together this PR to do just that!
If you'd like to opt-out of these PR's, add yourself to NO_AUTOFIX_USERS
in .github/workflows/pr-auto-fix.yaml
Co-authored-by: ctiller <ctiller@users.noreply.github.com>
Otherwise when we fill in peer in `SetCommonEntryFields` it will be
empty/invalid.
<!--
If you know who should review your pull request, please assign it to
that
person, otherwise the pull request would get assigned randomly.
If your pull request is for a specific language, please add the
appropriate
lang label.
-->
The following bugs are fixed:
* Missing ExecCtx in event engine endpoints and listeners
* Ref counting issue with iomgr endpoint which causes crashes in
overloaded situations
The PR includes a test which triggers these bugs by simulating an
overloaded system.
With some delay, this is a PR for
https://github.com/grpc/grpc/issues/32564 (and previously
https://github.com/grpc/grpc/pull/31791).
I looked into adding a regular `py_test` for this change [as
suggested](https://github.com/grpc/grpc/pull/31791#issuecomment-1423245116)
but I am not aware of any effect that the presence of a .pyi stub file
would have at runtime and where some sort of type-checking in a .py
script would be affected. Stub files are only for use by type checkers &
IDE's. I mean, something like this would work:
```
import helloworld_pb2
py_file = helloworld_pb2.__file__
pyi_file = py_file + 'i’
self.assertTrue(os.path.exists(pyi_file))
```
But that seems really hacky to me. Instead I created a simple rule test
for `py_proto_library` with Bazel Skylib which tests the declared
outputs for an example `py_proto_library` target. Indirectly, this also
tests that the declared output files are actually generated. Please let
me know if this is sufficient.
Here the recv message batch 103 was returning end of stream.
Per the reasoning in
https://github.com/grpc/proposal/blob/master/L104-core-ban-recv-with-send-status.md
Sending status is the final thing for a call on the server, so requiring
a recv message to complete when we've sent status is getting into at
best a gray area in out spec.
Add a strict ordering between that recv and the sending of status to
make a more deterministic test.
fixes b/286708835, b/286727273
Fix#33308
<!--
If you know who should review your pull request, please assign it to
that
person, otherwise the pull request would get assigned randomly.
If your pull request is for a specific language, please add the
appropriate
lang label.
-->
- Switched from yapf to black
- Reconfigure isort for black
- Resolve black/pylint idiosyncrasies
Note: I used `--experimental-string-processing` because black was
producing "implicit string concatenation", similar to what described
here: https://github.com/psf/black/issues/1837. While currently this
feature is experimental, it will be enabled by default:
https://github.com/psf/black/issues/2188. After running black with the
new string processing so that the generated code merges these `"hello" "
world"` strings concatenations, then I removed
`--experimental-string-processing` for stability, and regenerated the
code again.
To the reviewer: don't even try to open "Files Changed" tab 😄 It's
better to review commit-by-commit, and ignore `run black and isort`.
The function grpc_rb_call_run_batch has many places that could raise
errors, including in child functions. Since a raised error will longjump
out of the function, it will cause memory leaks since the function
cannot perform any clean up. This commit fixes the issue by wrapping the
whole function in an rb_ensure, which will ensure that a cleanup
function is ran before the error is propagated upwards.
The function grpc_rb_server_request_call has many places that could
raise errors, including in child functions. Since a raised error will
longjump out of the function, it will cause memory leaks since the
function cannot perform any clean up. This commit fixes the issue by
wrapping the whole function in an rb_ensure, which will ensure that a
cleanup function is ran before the error is propagated upwards.
`cmake_ninja_vs2019` and `default` are using the same
`cmake_ninja_vs2019` so having two tests are waste so this is removing
`cmake_ninja_vs2019` leaving `default` which does `cmake_ninja_vs2019`.
This change can cut the space consumption by half and with 250GB disc,
- Pre-test: 267,770,322,944 bytes free
- Post-test: 134,499,295,232 bytes free
~Something about the additional load from #33374 has caused some
entirely unrelated ios tests to fail sporadically. I'd prefer not to
roll back that however as it's discovered real bugs that had been
previously masked.~
These tests have been failing sporadically for some time.
We can track these on the daily flakiness reports, but whilst we
investigate let's just universally mark them as flaky so we don't
confuse folks trying to submit.
Also drop a few deadlines so that tests can run faster (where that's
safe)
<!--
If you know who should review your pull request, please assign it to
that
person, otherwise the pull request would get assigned randomly.
If your pull request is for a specific language, please add the
appropriate
lang label.
-->
Do not clutter the final error we see at the end with the before/after
stats.
#### Examples
###### Expected only status A, but found status B for method M:
```
[ FAILED ] CustomLbTest.test_custom_lb_config
======================================================================
FAIL: test_custom_lb_config (__main__.CustomLbTest)
CustomLbTest.test_custom_lb_config
----------------------------------------------------------------------
Traceback (most recent call last):
File "/Users/sergiitk/Development/grpc/tools/run_tests/xds_k8s_test_driver/tests/custom_lb_test.py", line 113, in test_custom_lb_config
self.assertRpcStatusCodes(test_client,
File "/Users/sergiitk/Development/grpc/tools/run_tests/xds_k8s_test_driver/framework/xds_k8s_testcase.py", line 345, in assertRpcStatusCodes
found_status = helpers_grpc.status_from_int(found_status_int)
AssertionError: Expected only status (15, DATA_LOSS), but found status (0, OK) for method UNARY_CALL.
Diff stats:
- method: UNARY_CALL
rpcs_started: 251
result:
(0, OK): 251
```
###### Expected non-zero RPCs with status A for method M.
```
[ FAILED ] AuthzTest.test_plaintext_allow
======================================================================
FAIL: test_plaintext_allow (__main__.AuthzTest)
AuthzTest.test_plaintext_allow
----------------------------------------------------------------------
Traceback (most recent call last):
File "/Users/sergiitk/Development/grpc/tools/run_tests/xds_k8s_test_driver/tests/authz_test.py", line 224, in test_plaintext_allow
self.configure_and_assert(test_client, 'host-wildcard',
File "/Users/sergiitk/Development/grpc/tools/run_tests/xds_k8s_test_driver/tests/authz_test.py", line 204, in configure_and_assert
self.assertRpcStatusCodes(test_client,
File "/Users/sergiitk/Development/grpc/tools/run_tests/xds_k8s_test_driver/framework/xds_k8s_testcase.py", line 355, in assertRpcStatusCodes
self.assertGreater(stats.result[expected_status_int],
AssertionError: 0 not greater than 0 : Expected non-zero completed RPCs with status (0, OK) for method EMPTY_CALL.
Diff stats:
- method: EMPTY_CALL
rpcs_started: 13
result: {}
```
<!--
If you know who should review your pull request, please assign it to
that
person, otherwise the pull request would get assigned randomly.
If your pull request is for a specific language, please add the
appropriate
lang label.
-->
<!--
If you know who should review your pull request, please assign it to
that
person, otherwise the pull request would get assigned randomly.
If your pull request is for a specific language, please add the
appropriate
lang label.
-->
Revert "Revert "[core] Add support for vsock transport"
(https://github.com/grpc/grpc/pull/33276)"
This reverts commit
c5ade3011a.
And fix the issue which broke the python build.
@markdroth@drfloob please review this PR. Thank you very much.
---------
Co-authored-by: AJ Heller <hork@google.com>
Fix for Cython build issue in aarch64.
We're seeing this error in aarch64 distribution test:
```
In file included from ./src/core/lib/slice/slice.h:36,
from ./src/core/lib/slice/slice_buffer.h:29,
from ./src/core/lib/transport/transport.h:60,
from ./src/core/lib/channel/channel_stack.h:75,
from ./src/core/lib/channel/call_tracer.h:32,
from src/python/grpcio/grpc/_cython/cygrpc.cpp:2230:
./src/core/lib/slice/slice_refcount.h: In member function 'void grpc_slice_refcount::Ref(grpc_core::DebugLocation)':
./src/core/lib/slice/slice_refcount.h:55:25: error: expected ')' before 'PRIdPTR'
"REF %p %" PRIdPTR "->%" PRIdPTR, this, prev_refs, prev_refs + 1);
^~~~~~~~
)
```
Based on [this
post](https://stackoverflow.com/questions/26182336/priuptr-preprocessor-bug-in-gcc),
it's caused by including `<inttypes.h>` before define
`__STDC_FORMAT_MACROS` marco.
`<inttypes.h>` was included in `core/lib/channel/call_tracer.h` and this
macro should already be defined in `grpc/grpc.h` through
`port_platform.h`, but we're still having issue in aarch64, so we
manually define the macro in this PR.
The approach of doing a recursive function call to expand the if checks
for known metadata names was tripping up an optimization clang has to
collapse that if/then tree into an optimized tree search over the set of
known strings. By unrolling that loop (with a code generator) we start
to present a pattern that clang *can* recognize, and hopefully get some
more stable and faster code generation as a benefit.
<!--
If you know who should review your pull request, please assign it to
that
person, otherwise the pull request would get assigned randomly.
If your pull request is for a specific language, please add the
appropriate
lang label.
-->
---------
Co-authored-by: ctiller <ctiller@users.noreply.github.com>
<!--
If you know who should review your pull request, please assign it to
that
person, otherwise the pull request would get assigned randomly.
If your pull request is for a specific language, please add the
appropriate
lang label.
-->
On endpoint shutdown, the `AsyncIOState` object was sometimes freed
before the last async read callback execution had run. This resulted in
callback handlers attempting to access a deleted `io_state_` object.
This PR removes the AsyncIOState dependency from the user callback
execution step, since only the callback and its status are needed at
that point.
Previous crashes were non-deterministic, but would reliably pop up once
on most runs of:
```
bazel test --cxxopt='/DEBUG=full' --config=windows_dbg --test_filter='CoreEnd2endTest*' \
--flaky_test_attempts=1 --cache_test_results=no \
--test_env=GRPC_VERBOSITY=debug --test_env=GRPC_TRACE=event_engine \
//test/core/end2end:core_end2end_tests@experiment=event_engine_client```
---------
Co-authored-by: drfloob <drfloob@users.noreply.github.com>
This had been intended to be 500ms for the first round, but
inadvertently got bumped up during some last minute investigations.
Tune it back down, let things settle out, and then see whether we want
to increase it or not.
<!--
If you know who should review your pull request, please assign it to
that
person, otherwise the pull request would get assigned randomly.
If your pull request is for a specific language, please add the
appropriate
lang label.
-->
Improvements to the `LoadBalancerAccumulatedStatsRequest` output. Makes
it readable.
This greatly affects `assertRpcStatusCodes()` output, used in authz and
custom_lb.
No before and after stats, just useful diff stats from now. Minimal and
readable.
Also diff stats have `rpcs_started` now.
![image](https://github.com/grpc/grpc/assets/672669/a4e38d82-be5a-4f31-9d88-da2bf9712d9b)
Output example:
```
--- Starting subTest __main__.AuthzTest.test_plaintext_allow.01_host_wildcard ---
[psm-grpc-client-765bfbf868-jqjm7:51561] >> RPC LoadBalancerStatsService.GetClientAccumulatedStats(request=LoadBalancerAccumulatedStatsRequest({}), wait_for_ready=True, timeout=600)
[psm-grpc-client-765bfbf868-jqjm7:51561] >> RPC XdsUpdateClientConfigureService.Configure(request=ClientConfigureRequest({'types': ['EMPTY_CALL'], 'metadata': [{'key': 'test', 'value': 'host-wildcard'}]}), timeout=5, wait_for_ready=True)
[psm-grpc-client-765bfbf868-jqjm7:51561] >> RPC LoadBalancerStatsService.GetClientAccumulatedStats(request=LoadBalancerAccumulatedStatsRequest({}), wait_for_ready=True, timeout=600)
[psm-grpc-client-765bfbf868-jqjm7:51561] >> RPC LoadBalancerStatsService.GetClientAccumulatedStats(request=LoadBalancerAccumulatedStatsRequest({}), wait_for_ready=True, timeout=600)
[psm-grpc-client-765bfbf868-jqjm7] << Received accumulated stats difference. Expecting RPCs with status (0, OK) for method EMPTY_CALL.
- method: EMPTY_CALL
rpcs_started: 13
result:
(0, OK): 14
--- Finished subTest __main__.AuthzTest.test_plaintext_allow.01_host_wildcard ---
```
In case of test failure, it'll still print all stats at the end,
including before and after:
```
AssertionError: Expected only status (15, DATA_LOSS), but found status (0, OK) for method UNARY_CALL.
Stats before:
- method: UNARY_CALL
rpcs_started: 2153
result:
(14, UNAVAILABLE): 1674
(0, OK): 479
Stats after:
- method: UNARY_CALL
rpcs_started: 2404
result:
(0, OK): 730
(14, UNAVAILABLE): 1674
Diff stats:
- method: UNARY_CALL
rpcs_started: 251
result:
(0, OK): 251
```
And as I was at it, also made `LoadBalancerStatsResponse` nice:
![image](https://github.com/grpc/grpc/assets/672669/b15908a7-bae4-41a0-a2f7-c903e398432a)
We just found out that our current Bazel setup does not support Python
3.11.
Thus PR updates some dependencies to allow using Bazel in Python 3.11.
Cython:
* Cython [backported Python 3.11 support change to
0.29x](https://github.com/cython/cython/issues/4500), but it appears
that the Cython version we are using in Bazel does not include the fix,
so we're using the latest stable version instead.
Gevent:
* The first version of gevent that supports [Python 3.11 is
22.08.0](https://github.com/gevent/gevent/issues/1903#issuecomment-1303227507).
#### Testing
* Tested locally using Python 3.11 virtual environment, was able to
reproduce the issue and verified that those changes were able to fix it.