These checks have not been needed since way back in #22100, but they
were never removed, and they've even propagated to a bunch of new
policies via copy-paste.
Fix at-head tests (this is a missing piece of
https://github.com/grpc/grpc/pull/32905) with the following error;
```
/var/local/git/grpc/tools/run_tests/helper_scripts/build_python.sh: line 126: python3.8: command not found
```
While a proper fix is on the way, this mitigates the number of
duplicated container logs in the xds test server/client pod logs.
The issue is that we only wait between stream restarts when an exception
is caught, which isn't always the reason the stream gets broken. Another
reason is the main container being shut down by k8s. In this situation,
we essentially do
```py
while True:
try:
restart_stream()
read_all_logs_from_pod_start()
except Exception:
logger.warning('error')
wait_seconds(1)
```
This PR makes it
```py
while True:
try:
restart_stream()
read_all_logs_from_pod_start()
except Exception:
logger.warning('error')
finally:
wait_seconds(5)
```
Valgrind will now only fail the build on definite leaks, not "possible"
leaks. A trivial example that fails the PHP valgrind test as it is
configured today:
```
namespace {
grpc_core::NoDestruct<grpc_core::BackOff> g_backoff{
grpc_core::BackOff::Options()};
} // namespace
```
Valgrind detects a possible leak because BackOff contains an
absl::BitGen, which calls `new` through a chain of ownership
indirection. This is what Valgrind calls an [interior
pointer](https://valgrind.org/docs/manual/mc-manual.html#mc-manual.options:~:text=%22Possibly%20lost%22.%20This,have%20interior%2Dpointers.).
Our CI will no longer fail them
There are some places where the G name was not updated properly in the
previous release
<!--
If you know who should review your pull request, please assign it to
that
person, otherwise the pull request would get assigned randomly.
If your pull request is for a specific language, please add the
appropriate
lang label.
-->
Oops I missed important changes from
https://github.com/grpc/grpc/pull/32712. And it turned out that there
are two problems that I couldn't fix at this point.
- Windows Bazel RBE Linker Error: This may be caused by how new Bazel 6
invokes build tools chain but it's not clear. I put workaround to use
Bazel 5 by using `OVERRIDE_BAZEL_VERSION=5.4.1`
- Rule `rules_pods` to fetch CronetFramework from CocoaPod has
incompatibility with sort of built-in apple toolchain.
(https://github.com/bazel-xcode/PodToBUILD/issues/232): I couldn't find
a workaround to fix this so I ended up disabling all tests depending
this target.
Fix `python_alpine` test failure with
```
fatal: detected dubious ownership in repository at '/var/local/jenkins/grpc'
To add an exception for this directory, call:
git config --global --add safe.directory /var/local/jenkins/grpc
```
Fix https://github.com/grpc/grpc/issues/32638
<!--
If you know who should review your pull request, please assign it to
that
person, otherwise the pull request would get assigned randomly.
If your pull request is for a specific language, please add the
appropriate
lang label.
-->
<!--
If you know who should review your pull request, please assign it to
that
person, otherwise the pull request would get assigned randomly.
If your pull request is for a specific language, please add the
appropriate
lang label.
-->
This paves the way for making pick_first the universal leaf policy (see
#32692), which will be needed for the dualstack design. That change will
require changing pick_first to see both the raw connectivity state and
the health-checking connectivity state of a subchannel, so that we can
enable health checking when pick_first is used underneath round_robin
without actually changing the pick_first connectivity logic (currently,
pick_first always disables health checking). To make it possible to do
that, this PR moves the health checking code out of the subchannel and
into a separate API using the same data-watcher mechanism that was added
for ORCA OOB calls.
Change was created by the release automation script. See go/grpc-release
---------
Co-authored-by: Stanley Cheung <stanleycheung@google.com>
Co-authored-by: AJ Heller <hork@google.com>
Co-authored-by: Yijie Ma <yijiem.main@gmail.com>
Co-authored-by: apolcyn <apolcyn@google.com>
Co-authored-by: Jan Tattermusch <jtattermusch@google.com>
`tearDownClass` is not executed when `setUpClass` failed. In URL Map
test suite, this leads to a test client that failed to start not being
cleaned up.
This PR change the URL Map test suite to register a custom
`addClassCleanup` callback, instead of relying on the `tearDownClass`.
Unlike `tearDownClass`, cleanup callbacks are executed when the
`setUpClass` failed.
ref b/276761453
The PR also creates a separate BUILD target for:
- chttp2 context list
- iomgr buffer_list
- iomgr internal errqueue
This would allow the context list to be included as standalone
dependencies for EventEngine implementations.
In order to help https://github.com/grpc/grpc/pull/32748, change the
test so that it tells us what the problem is in the logs.
<!--
If you know who should review your pull request, please assign it to
that
person, otherwise the pull request would get assigned randomly.
If your pull request is for a specific language, please add the
appropriate
lang label.
-->
As Protobuf is going to support Cord to reduce memory copy when
[de]serializing Cord fields, gRPC is going to leverage it. This
implementation is based on the internal one but it's slightly modified
to use the public APIs of Cord. only
This test proves that `global_stats.IncrementHttp2MetadataSize(0)` works
<!--
If you know who should review your pull request, please assign it to
that
person, otherwise the pull request would get assigned randomly.
If your pull request is for a specific language, please add the
appropriate
lang label.
-->
Followup for https://github.com/grpc/grpc/pull/31141.
IWYU and clang-tidy have been "moved" to a separate kokoro job, but as
it turns out the sanity job still runs all of `[sanity, clang-tidy,
iwyu]`, which makes the grpc_sanity jobs very slow.
The issue is that grpc_sanity selects tasks that have "sanity" label on
them and as of now, clang-tidy and iwyu still do.
It can be verified by:
```
tools/run_tests/run_tests_matrix.py -f sanity --dry_run
Will run these tests:
run_tests_sanity_linux_dbg_native: "python3 tools/run_tests/run_tests.py --use_docker -t -j 2 -x run_tests/sanity_linux_dbg_native/sponge_log.xml --report_suite_name sanity_linux_dbg_native -l sanity -c dbg --iomgr_platform native --report_multi_target"
run_tests_clang-tidy_linux_dbg_native: "python3 tools/run_tests/run_tests.py --use_docker -t -j 2 -x run_tests/clang-tidy_linux_dbg_native/sponge_log.xml --report_suite_name clang-tidy_linux_dbg_native -l clang-tidy -c dbg --iomgr_platform native --report_multi_target"
run_tests_iwyu_linux_dbg_native: "python3 tools/run_tests/run_tests.py --use_docker -t -j 2 -x run_tests/iwyu_linux_dbg_native/sponge_log.xml --report_suite_name iwyu_linux_dbg_native -l iwyu -c dbg --iomgr_platform native --report_multi_target"
```
This PR should fix this (be removing the umbrella "sanity" label from
clang-tidy and iwyu)
Initial bazel tests for C# protoc and grpc_protoc_plugin.
This initial test just generated code from the proto file and compares
the generated code against expected files.
I've put the tests in `test/csharp/codegen` as that is similar to where
the C++ tests are placed, but they could be moved to
`src\csharp` if that is a better place.
Further tests can be added once the initial framework for the tests is
agreed.
- Added `fuzzer_input.proto` and `NetworkInput` proto message
- Migrated client_fuzzer and server_fuzzer to proto fuzzer
- Migrated the existing corpus and verified that the code coverage (e.g.
chttp2) stays the same
Probably need to cherrypick due to amount of files changed.
<!--
If you know who should review your pull request, please assign it to
that
person, otherwise the pull request would get assigned randomly.
If your pull request is for a specific language, please add the
appropriate
lang label.
-->
<!--
If you know who should review your pull request, please assign it to
that
person, otherwise the pull request would get assigned randomly.
If your pull request is for a specific language, please add the
appropriate
lang label.
-->
<!--
If you know who should review your pull request, please assign it to
that
person, otherwise the pull request would get assigned randomly.
If your pull request is for a specific language, please add the
appropriate
lang label.
-->
<!--
If you know who should review your pull request, please assign it to
that
person, otherwise the pull request would get assigned randomly.
If your pull request is for a specific language, please add the
appropriate
lang label.
-->
@sampajano
<!--
If you know who should review your pull request, please assign it to
that
person, otherwise the pull request would get assigned randomly.
If your pull request is for a specific language, please add the
appropriate
lang label.
-->
This change allows metadata keys to set the appropriate compression
algorithm in the hpack encoder, without needing to change the source
text of the hpack encoder.
We'll leverage this with #32650 to allow some important but
Google-internal metadata to be compressed appropriately.
---------
Co-authored-by: ctiller <ctiller@users.noreply.github.com>
This bug occurred when the same xDS server was configured twice in the
same bootstrap config, once in an authority and again as the top-level
server. In that case, we were incorrectly failing to de-dup them and
were creating a separate channel for the LRS stream than the one that
already existed for the ADS stream. We fix this by canonicalizing the
server keys the same way in both cases.
As a separate follow-up item, I will work on trying to find a better way
to key these maps that does not suffer from this kind of fragility.
Previously, we didn't configure the failureThreshold, so it used its
default value. The final `startupProbe` looked like this:
```json
{
"startupProbe": {
"failureThreshold": 3,
"periodSeconds": 3,
"successThreshold": 1,
"tcpSocket": {
"port": 8081
},
"timeoutSeconds": 1
}
```
Because of it, the total time before k8s killed the container was 3
times `failureThreshold` * 3 seconds wait between probes `periodSeconds`
= 9 seconds total (±3 seconds waiting for the probe response).
This greatly affected PSM Security test server, some implementations of
which waited for the ADS stream to be configured before starting
listening on the maintenance port. This lead for the server container
being killed for ~7 times before a successful startup:
```
15:55:08.875586 "Killing container with a grace period"
15:53:38.875812 "Killing container with a grace period"
15:52:47.875752 "Killing container with a grace period"
15:52:38.874696 "Killing container with a grace period"
15:52:14.874491 "Killing container with a grace period"
15:52:05.875400 "Killing container with a grace period"
15:51:56.876138 "Killing container with a grace period"
```
These extra delays lead to PSM security tests timing out.
ref b/277336725
This maybe used to quickly verify the code coverage of a modified test
locally (e.g. fuzzer).
Example:
```
# Build and run target; the raw profile will be written to $LLVM_PROFILE_FILE when the program exits
$ bazel build --config=dbg --config=fuzzer_asan --config=coverage //test/core/end2end/fuzzers:api_fuzzer
$ LLVM_PROFILE_FILE="api_fuzzer.profraw" bazel-bin/test/core/end2end/fuzzers/api_fuzzer test/core/end2end/fuzzers/api_fuzzer_corpus/*
# Create coverage report
$ llvm-profdata-14 merge -sparse api_fuzzer.profraw -o api_fuzzer.profdata
$ llvm-cov-14 report ./bazel-bin/test/core/end2end/fuzzers/api_fuzzer --instr-profile=api_fuzzer.profdata
```
Sample report:
f94e444f25/gistfile1.txt
One trick is that the binary needs to be statically linked, e.g. specify
`linkstatic = 1` on the BUILD target.
See https://clang.llvm.org/docs/SourceBasedCodeCoverage.html for more
info.
<!--
If you know who should review your pull request, please assign it to
that
person, otherwise the pull request would get assigned randomly.
If your pull request is for a specific language, please add the
appropriate
lang label.
-->
We shouldn't depend on how much the compression algorithm compresses the
bytes to. This is causing flakiness internally.
<!--
If you know who should review your pull request, please assign it to
that
person, otherwise the pull request would get assigned randomly.
If your pull request is for a specific language, please add the
appropriate
lang label.
-->
It already started hitting the limit resulting in continuous failure.
https://github.com/grpc/grpc/pull/32603 is believed to contribute to
this time increase but let's bump it first and visit this issue later.