Similar to https://github.com/grpc/grpc/pull/33542.
Note that there's a ticket to automatically use the one specified in the
--server_image_canonical flag, but for now we just hardcode.
ref b/261911148, b/282106799.
Along with an experiment this time
<!--
If you know who should review your pull request, please assign it to
that
person, otherwise the pull request would get assigned randomly.
If your pull request is for a specific language, please add the
appropriate
lang label.
-->
Adds a test for the experiments codegen. It updates the codegen to parse
test_experiments.yaml and test_experiments_rollouts.yaml files and
generate test_experiments.h and test_experiments.cc files along with an
experiments_test.cc file. The experiments test verifies the returned
value of IsExperimentEnabled with the expected value.
Add bazel dependency on opentelemetry-cpp.
<!--
If you know who should review your pull request, please assign it to
that
person, otherwise the pull request would get assigned randomly.
If your pull request is for a specific language, please add the
appropriate
lang label.
-->
PanCakes to the rescue!
We noticed that our 'sanity' test was going to fail, but we think we can
fix that automatically, so we put together this PR to do just that!
If you'd like to opt-out of these PR's, add yourself to NO_AUTOFIX_USERS
in .github/workflows/pr-auto-fix.yaml
Co-authored-by: HannahShiSFB <HannahShiSFB@users.noreply.github.com>
This adds pre-built library for aarch64 linux, will help improve the
install speed and avoid building environment issues at customer side.
@apolcyn@jtattermusch Can you help build and push the new rake compiler
image?
Will update the tag and hash after the image is available
Manually tested locally:
```
uname -a
Linux u20 5.15.49-linuxkit #1 SMP PREEMPT Tue Sep 13 07:51:32 UTC 2022 aarch64 aarch64 aarch64 GNU/Linux
```
```
time gem install /work/ruby/grpc/pkg/grpc-1.56.0.dev-aarch64-linux.gem
Successfully installed grpc-1.56.0.dev-aarch64-linux
Parsing documentation for grpc-1.56.0.dev-aarch64-linux
Installing ri documentation for grpc-1.56.0.dev-aarch64-linux
Done installing documentation for grpc after 0 seconds
1 gem installed
real 0m22.794s
user 0m17.268s
sys 0m5.156s
```
```
ruby greeter_server.rb &
[1] 319
ruby greeter_client.rb
"Greeting: Hello world"
```
Fixes:
https://github.com/grpc/grpc/issues/31855https://github.com/grpc/grpc/issues/29489
This reverts commit e107ff5e99.
<!--
If you know who should review your pull request, please assign it to
that
person, otherwise the pull request would get assigned randomly.
If your pull request is for a specific language, please add the
appropriate
lang label.
-->
<!--
If you know who should review your pull request, please assign it to
that
person, otherwise the pull request would get assigned randomly.
If your pull request is for a specific language, please add the
appropriate
lang label.
-->
Noticed some inconsistencies in our keepalive configuration -
* Earlier, even if keepalive pings were disabled, we would be scheduling
keepalive pings at an interval of INT_MAX ms.
* We were not using `g_default_client_keepalive_permit_without_calls` /
`g_default_server_keepalive_permit_without_calls`. They are both false
by default but they can be overridden in
`grpc_chttp2_config_default_keepalive_args`.
<!--
If you know who should review your pull request, please assign it to
that
person, otherwise the pull request would get assigned randomly.
If your pull request is for a specific language, please add the
appropriate
lang label.
-->
This is a no-op, just reordering `requirements.lock`.
By providing `-r requirements.txt` to `pip freeze` it's able to break up
dependencies required via `requirements.txt`, and sub-dependencies
installed to satisfy them.
I've got a hypothesis that we're losing isolation between test shards
right now for "some reason".
This is a change to reflect test sharding in the port distribution that
we use, in an attempt to alleviate that.
---------
Co-authored-by: ctiller <ctiller@users.noreply.github.com>
We should probably cap this so that our customers have a chance of
cloning the repository.
---------
Co-authored-by: ctiller <ctiller@users.noreply.github.com>
The following bugs are fixed:
* Missing ExecCtx in event engine endpoints and listeners
* Ref counting issue with iomgr endpoint which causes crashes in
overloaded situations
The PR includes a test which triggers these bugs by simulating an
overloaded system.
- Switched from yapf to black
- Reconfigure isort for black
- Resolve black/pylint idiosyncrasies
Note: I used `--experimental-string-processing` because black was
producing "implicit string concatenation", similar to what described
here: https://github.com/psf/black/issues/1837. While currently this
feature is experimental, it will be enabled by default:
https://github.com/psf/black/issues/2188. After running black with the
new string processing so that the generated code merges these `"hello" "
world"` strings concatenations, then I removed
`--experimental-string-processing` for stability, and regenerated the
code again.
To the reviewer: don't even try to open "Files Changed" tab 😄 It's
better to review commit-by-commit, and ignore `run black and isort`.
`cmake_ninja_vs2019` and `default` are using the same
`cmake_ninja_vs2019` so having two tests are waste so this is removing
`cmake_ninja_vs2019` leaving `default` which does `cmake_ninja_vs2019`.
This change can cut the space consumption by half and with 250GB disc,
- Pre-test: 267,770,322,944 bytes free
- Post-test: 134,499,295,232 bytes free
Do not clutter the final error we see at the end with the before/after
stats.
#### Examples
###### Expected only status A, but found status B for method M:
```
[ FAILED ] CustomLbTest.test_custom_lb_config
======================================================================
FAIL: test_custom_lb_config (__main__.CustomLbTest)
CustomLbTest.test_custom_lb_config
----------------------------------------------------------------------
Traceback (most recent call last):
File "/Users/sergiitk/Development/grpc/tools/run_tests/xds_k8s_test_driver/tests/custom_lb_test.py", line 113, in test_custom_lb_config
self.assertRpcStatusCodes(test_client,
File "/Users/sergiitk/Development/grpc/tools/run_tests/xds_k8s_test_driver/framework/xds_k8s_testcase.py", line 345, in assertRpcStatusCodes
found_status = helpers_grpc.status_from_int(found_status_int)
AssertionError: Expected only status (15, DATA_LOSS), but found status (0, OK) for method UNARY_CALL.
Diff stats:
- method: UNARY_CALL
rpcs_started: 251
result:
(0, OK): 251
```
###### Expected non-zero RPCs with status A for method M.
```
[ FAILED ] AuthzTest.test_plaintext_allow
======================================================================
FAIL: test_plaintext_allow (__main__.AuthzTest)
AuthzTest.test_plaintext_allow
----------------------------------------------------------------------
Traceback (most recent call last):
File "/Users/sergiitk/Development/grpc/tools/run_tests/xds_k8s_test_driver/tests/authz_test.py", line 224, in test_plaintext_allow
self.configure_and_assert(test_client, 'host-wildcard',
File "/Users/sergiitk/Development/grpc/tools/run_tests/xds_k8s_test_driver/tests/authz_test.py", line 204, in configure_and_assert
self.assertRpcStatusCodes(test_client,
File "/Users/sergiitk/Development/grpc/tools/run_tests/xds_k8s_test_driver/framework/xds_k8s_testcase.py", line 355, in assertRpcStatusCodes
self.assertGreater(stats.result[expected_status_int],
AssertionError: 0 not greater than 0 : Expected non-zero completed RPCs with status (0, OK) for method EMPTY_CALL.
Diff stats:
- method: EMPTY_CALL
rpcs_started: 13
result: {}
```
<!--
If you know who should review your pull request, please assign it to
that
person, otherwise the pull request would get assigned randomly.
If your pull request is for a specific language, please add the
appropriate
lang label.
-->
<!--
If you know who should review your pull request, please assign it to
that
person, otherwise the pull request would get assigned randomly.
If your pull request is for a specific language, please add the
appropriate
lang label.
-->
The approach of doing a recursive function call to expand the if checks
for known metadata names was tripping up an optimization clang has to
collapse that if/then tree into an optimized tree search over the set of
known strings. By unrolling that loop (with a code generator) we start
to present a pattern that clang *can* recognize, and hopefully get some
more stable and faster code generation as a benefit.
<!--
If you know who should review your pull request, please assign it to
that
person, otherwise the pull request would get assigned randomly.
If your pull request is for a specific language, please add the
appropriate
lang label.
-->
---------
Co-authored-by: ctiller <ctiller@users.noreply.github.com>
Improvements to the `LoadBalancerAccumulatedStatsRequest` output. Makes
it readable.
This greatly affects `assertRpcStatusCodes()` output, used in authz and
custom_lb.
No before and after stats, just useful diff stats from now. Minimal and
readable.
Also diff stats have `rpcs_started` now.
![image](https://github.com/grpc/grpc/assets/672669/a4e38d82-be5a-4f31-9d88-da2bf9712d9b)
Output example:
```
--- Starting subTest __main__.AuthzTest.test_plaintext_allow.01_host_wildcard ---
[psm-grpc-client-765bfbf868-jqjm7:51561] >> RPC LoadBalancerStatsService.GetClientAccumulatedStats(request=LoadBalancerAccumulatedStatsRequest({}), wait_for_ready=True, timeout=600)
[psm-grpc-client-765bfbf868-jqjm7:51561] >> RPC XdsUpdateClientConfigureService.Configure(request=ClientConfigureRequest({'types': ['EMPTY_CALL'], 'metadata': [{'key': 'test', 'value': 'host-wildcard'}]}), timeout=5, wait_for_ready=True)
[psm-grpc-client-765bfbf868-jqjm7:51561] >> RPC LoadBalancerStatsService.GetClientAccumulatedStats(request=LoadBalancerAccumulatedStatsRequest({}), wait_for_ready=True, timeout=600)
[psm-grpc-client-765bfbf868-jqjm7:51561] >> RPC LoadBalancerStatsService.GetClientAccumulatedStats(request=LoadBalancerAccumulatedStatsRequest({}), wait_for_ready=True, timeout=600)
[psm-grpc-client-765bfbf868-jqjm7] << Received accumulated stats difference. Expecting RPCs with status (0, OK) for method EMPTY_CALL.
- method: EMPTY_CALL
rpcs_started: 13
result:
(0, OK): 14
--- Finished subTest __main__.AuthzTest.test_plaintext_allow.01_host_wildcard ---
```
In case of test failure, it'll still print all stats at the end,
including before and after:
```
AssertionError: Expected only status (15, DATA_LOSS), but found status (0, OK) for method UNARY_CALL.
Stats before:
- method: UNARY_CALL
rpcs_started: 2153
result:
(14, UNAVAILABLE): 1674
(0, OK): 479
Stats after:
- method: UNARY_CALL
rpcs_started: 2404
result:
(0, OK): 730
(14, UNAVAILABLE): 1674
Diff stats:
- method: UNARY_CALL
rpcs_started: 251
result:
(0, OK): 251
```
And as I was at it, also made `LoadBalancerStatsResponse` nice:
![image](https://github.com/grpc/grpc/assets/672669/b15908a7-bae4-41a0-a2f7-c903e398432a)
Fixes the issue introduced in https://github.com/grpc/grpc/pull/33104,
where stopping the current run didn't reset `self.time_start_requested`,
`self.time_start_completed`, `self.time_start_stopped`. Because of this,
the subsetting test (the only one [redeploying the client
app](10001d16a9/tools/run_tests/xds_k8s_test_driver/tests/subsetting_test.py (L73C1-L74)))
started failing with:
```py
Traceback (most recent call last):
File "xds_k8s_test_driver/tests/subsetting_test.py", line 76, in test_subsetting_basic
test_client: _XdsTestClient = self.startTestClient(
File "xds_k8s_test_driver/framework/xds_k8s_testcase.py", line 615, in startTestClient
test_client = self.client_runner.run(server_target=test_server.xds_uri,
File "xds_k8s_test_driver/framework/test_app/runners/k8s/k8s_xds_client_runner.py", line 110, in run
super().run()
File "xds_k8s_test_driver/framework/test_app/runners/k8s/k8s_base_runner.py", line 112, in run
raise RuntimeError(
RuntimeError: Deployment psm-grpc-client: has already been started at 2023-05-27T13:47:15.262461
```
This PR:
1. Instead of relying on the `time_start_requested`,
`time_start_stopped` to produce GCP links, tracks the history run of
each deployment. This fixes the issue described above, and adds support
for listing all past runs executed by a k8s runner.
2. Minor: remove the unnecessary call to `test_client.cleanup()` when
there's no past deployment runs (e.g. at the first iteration of `for i
in range(_NUM_CLIENTS):`)
- switch to json_object_loader for config parsing
- use `absl::string_view` instead of `const char*` for cert provider
names
- change cert provider registry to use a map instead of a vector
- remove unused mesh_ca cert provider factory
Allow for multiple `--grpc_experiments`, `--grpc_trace` command line
arguments to be added, accumulate them, and provide them to gRPC as one
thing.
<!--
If you know who should review your pull request, please assign it to
that
person, otherwise the pull request would get assigned randomly.
If your pull request is for a specific language, please add the
appropriate
lang label.
-->
---------
Co-authored-by: ctiller <ctiller@users.noreply.github.com>
`tools/run_tests/sanity/check_absl_mutex.sh` was broken, a missing paren
crashed the script if run locally. It's unclear yet how our sanity
checks were not complaining about this, `run_tests.py` does not save the
log.
I've noticed we add the cleanup hook after setting up the
infrastructure. Thus, if infra setup failed, the cleanup won't work.
This fixes it, and adds extra checks to not call
`cls.test_client_runner` if it's not set.
Fail test if client or server pods restarted during test.
#### Testing
Tested locally, test will fail with message similar to:
```
----------------------------------------------------------------------
Traceback (most recent call last):
File "/usr/local/google/home/xuanwn/workspace/xds/grpc/tools/run_tests/xds_k8s_test_driver/framework/xds_k8s_testcase.py", line 501, in tearDown
))
AssertionError: 5 != 0 : Server pods unexpectedly restarted {sever_restarts} times during test.
----------------------------------------------------------------------
Ran 1 test in 886.867s
```
Better logging for `assertRpcStatusCodes`.
(got tired of looking up the status names)
#### Unexpected status found
Before:
```
AssertionError: AssertionError: Expected only status 15 but found status 0 for method UNARY_CALL:
stats_per_method {
key: "UNARY_CALL"
value {
result {
key: 0
value: 251
}
}
}
```
After:
```
AssertionError: Expected only status (15, DATA_LOSS), but found status (0, OK) for method UNARY_CALL:
stats_per_method {
key: "UNARY_CALL"
value {
result {
key: 0
value: 251
}
}
}
```
#### No traffic with expected status
Before:
```
AssertionError: 0 not greater than 0
```
After:
```
AssertionError: 0 not greater than 0 : Expected non-zero RPCs with status (15, DATA_LOSS) for method UNARY_CALL, got:
stats_per_method {
key: "UNARY_CALL"
value {
result {
key: 0
value: 251
}
result {
key: 15
value: 0
}
}
}
```
Before this change, `Found subchannel in state READY` and `Channel to
xds:///psm-grpc-server:61404 transitioned to state ` would dump the full
channel/subchannel, in some implementations that expose
ChannelData.trace (f.e. go) would add 300 extra lines of log.
Now we print a brief repr-like chanel/subchannel info:
```
Found subchannel in state READY: <Subchannel subchannel_id=9 target=10.110.1.44:8080 state=READY>
Channel to xds:///psm-grpc-server:61404 transitioned to state READY: <Channel channel_id=2 target=xds:///psm-grpc-server:61404 state=READY>
```
Also while waiting for the channel, we log channel_id now too:
```
Waiting to report a READY channel to xds:///psm-grpc-server:61404
Server channel: <Channel channel_id=2 target=xds:///psm-grpc-server:61404 state=TRANSIENT_FAILURE>
Server channel: <Channel channel_id=2 target=xds:///psm-grpc-server:61404 state=TRANSIENT_FAILURE>
Server channel: <Channel channel_id=2 target=xds:///psm-grpc-server:61404 state=TRANSIENT_FAILURE>
Server channel: <Channel channel_id=2 target=xds:///psm-grpc-server:61404 state=TRANSIENT_FAILURE>
Server channel: <Channel channel_id=2 target=xds:///psm-grpc-server:61404 state=TRANSIENT_FAILURE>
Server channel: <Channel channel_id=2 target=xds:///psm-grpc-server:61404 state=READY>
```
Similar to what we already do in other test suites:
- Try cleaning up resources three times.
- If unsuccessful, don't fail the test and just log the error. The
cleanup script should be the one to deal with this.
ref b/282081851