Removes noise from the cleanup/teardown ops.
#### GCP APIs
In GCP APIs, change log level for delete operations that failed because the resource doesn't exist (API 404) from `info` to `debug`. Framework's logging philosophy is to only log external operations (e.g. APIs, RPCs). If no error logged, the op is assumed successful.
In the deletion case, is still possible to discriminate between whether the op was actually performed by observing the `Waiting %s sec for %s operation id: %s` log message.
#### K8s APIs
In K8s APIs:
- For delete operations that failed because the resource doesn't exist (API 404) the log level is changed from `info` to `debug`
- For delete operations that failed for any other reason, the log level is changed from `info` to `warning`
- When `wait_for_deletion` is enabled (it's the default) the delete operation will be confirmed with `logger.info("<resource_kind> %s deleted", name)`. Previously it logged at the `debug` level.
Closes#35131
COPYBARA_INTEGRATE_REVIEW=https://github.com/grpc/grpc/pull/35131 from sergiitk:psm-interop-debug-log-on-delete-404 f6629e5132
PiperOrigin-RevId: 587851692
We dont need this check anymore .
Deleting the check from the yaml and the sh file.
<!--
If you know who should review your pull request, please assign it to that
person, otherwise the pull request would get assigned randomly.
If your pull request is for a specific language, please add the appropriate
lang label.
-->
Closes#35161
PiperOrigin-RevId: 587784923
Add a variant of `Spawn` that returns a promise that can be awaited by another activity.
This allows us to simply implement complex cross-activity synchronization.
(necessary building block for #34740)
Also adds an inter-activity latch as a building block to test this work.
Closes#34744
COPYBARA_INTEGRATE_REVIEW=https://github.com/grpc/grpc/pull/34744 from ctiller:ninteen-ninety-nine 19074b255f
PiperOrigin-RevId: 582450643
`StatusFlag` acts like a status, but is just a boolean (we don't want to
accidentally treat a boolean as something that indicates failure in case
it's not)
Similarly `ValueOrFailure` looks like `StatusOr` but reduces the failure
space to one value.
---------
Co-authored-by: ctiller <ctiller@users.noreply.github.com>
We're seeing timeout errors in our distribution test:
https://fusion2.corp.google.com/invocations/dfa9aaa9-e94b-479e-8c28-a39d98d277bc/targets/github%2Fgrpc%2Fbuild_artifacts_python;config=default/tests.
Sample error:
`2023-11-10 09:12:19,512 TIMEOUT:
build_artifact.python_windows_x86_Python39_32bit [pid=2320,
time=2700.1sec]`
This change increases timeout for windows build artifact jobs to 7200s,
which aligns with all other jobs (except `linux_extra`, which is 3600s).
<!--
If you know who should review your pull request, please assign it to
that
person, otherwise the pull request would get assigned randomly.
If your pull request is for a specific language, please add the
appropriate
lang label.
-->
This commit upgrades gRPC to protobuf v25.0 and makes some fixes to
account for upb changes. One major change is that upb has been merged
into the protobuf repo, so we can now drop the separate `@upb`
dependency. Another is that `.upb.c` files no longer exist and there are
new `.upb_minitable.h` and `.upb_minitable.c` files. The longer
filenames exceeded a Windows restriction, so to work around that I
renamed the `upb-generated` directory to just `upb-gen`, and likewise
for `upbdefs-generated`.
Design is documented at
[go/windows-dns-resolver-issue](http://go/windows-dns-resolver-issue)
(note that the design doc is slightly outdated regarding the shared
ownership model of the virtual socket that was implemented in
13bd2b404e).
Passed `//test/cpp/naming:resolver_component_tests_runner_invoker` and
`//test/cpp/naming:cancel_ares_query_test`:
```
C:\Users\yijiem\projects\grpc>bazel --output_base=C:\bazel6 test --dynamic_mode=off --verbose_failures --test_env=GRPC_EXPERIMENTS=event_engine_dns --test_env=GRPC_VERBOSITY=debug --test_env=GRPC_TRACE=cares_resolver --enable_runfiles=yes --nocache_test_results //test/cpp/naming:resolver_component_tests_runner_invoker
INFO: Analyzed target //test/cpp/naming:resolver_component_tests_runner_invoker (1 packages loaded, 8 targets configured).
INFO: Found 1 test target...
INFO: From Compiling src/core/lib/event_engine/windows/windows_engine.cc:
C:\bazel6\execroot\com_github_grpc_grpc\src/core/lib/channel/channel_args.h(287): warning C4312: 'reinterpret_cast': conversion from 'int' to 'void *' of greater size
Target //test/cpp/naming:resolver_component_tests_runner_invoker up-to-date:
bazel-bin/test/cpp/naming/resolver_component_tests_runner_invoker.exe
INFO: Elapsed time: 230.374s, Critical Path: 228.54s
INFO: 9 processes: 2 internal, 7 local.
INFO: Build completed successfully, 9 total actions
//test/cpp/naming:resolver_component_tests_runner_invoker PASSED in 221.2s
Executed 1 out of 1 test: 1 test passes.
```
```
C:\Users\yijiem\projects\grpc>bazel --output_base=C:\bazel6 test --dynamic_mode=off --verbose_failures --test_env=GRPC_EXPERIMENTS=event_engine_dns --test_env=GRPC_VERBOSITY=debug --test_env=GRPC_TRACE=cares_resolver --enable_runfiles=yes --nocache_test_results //test/cpp/naming:cancel_ares_query_test
INFO: Analyzed target //test/cpp/naming:cancel_ares_query_test (0 packages loaded, 0 targets configured).
INFO: Found 1 test target...
Target //test/cpp/naming:cancel_ares_query_test up-to-date:
bazel-bin/test/cpp/naming/cancel_ares_query_test.exe
INFO: Elapsed time: 49.656s, Critical Path: 48.00s
INFO: 6 processes: 2 internal, 4 local.
INFO: Build completed successfully, 6 total actions
//test/cpp/naming:cancel_ares_query_test PASSED in 43.0s
Executed 1 out of 1 test: 1 test passes.
```
<!--
If you know who should review your pull request, please assign it to
that
person, otherwise the pull request would get assigned randomly.
If your pull request is for a specific language, please add the
appropriate
lang label.
-->
Modeled after mutexes in the Rust ecosystem: the mutex owns the data
provided, and acquisition of the mutex returns a handle with which to
manipulate that data.
This fits in nicely with the execution environment we've established
whereby we may want to pass the lock from lambda to lambda for some
time.
---------
Co-authored-by: ctiller <ctiller@users.noreply.github.com>
Add a config to experiments & rollouts to allow dependent experiments to
be flagged.
We're getting past the point where it's possible to reason about which
experiments need to be turned off if we disable some other experiment,
and so this provides some additional rollout safety.
Can be specified in both experiments and rollouts: experiments.yaml
makes the most sense and we should default to it, but rollouts.yaml lets
us put dependencies between internal & external dependencies internally
and that's gonna be a little useful.
---------
Co-authored-by: ctiller <ctiller@users.noreply.github.com>
This is a follow-up PR of #34191, which handles the error condition of
endpoints failed to write/read in chaotic-good client transport.
This PR needs to be merged after #34191.
<!--
If you know who should review your pull request, please assign it to
that
person, otherwise the pull request would get assigned randomly.
If your pull request is for a specific language, please add the
appropriate
lang label.
-->
<!--
If you know who should review your pull request, please assign it to
that
person, otherwise the pull request would get assigned randomly.
If your pull request is for a specific language, please add the
appropriate
lang label.
-->
---------
Co-authored-by: Bradley Hess <bdhess@google.com>
Co-authored-by: AJ Heller <hork@google.com>
Now we log pretty much identical message:
```
client_app.py:320] [psm-grpc-client-7768f6597-nvtgl] Detected successful calls to xDS control plane: trafficdirector.googleapis.com:443
client_app.py:292] [psm-grpc-client-7768f6597-nvtgl] ADS: Detected successful calls to xDS control plane trafficdirector.googleapis.com:443
```
This PR will log the latest channel state in the first message, similar
to what we do in `find_server_channel_with_state`:
52c08f4498/tools/run_tests/xds_k8s_test_driver/framework/test_app/client_app.py (L367-L371)
After the change:
```
client_app.py:320] [psm-grpc-client-6566595cff-8wrfd] Detected successful calls to xDS control plane trafficdirector.googleapis.com:443, channel: <Channel channel_id=4 target=trafficdirector.googleapis.com:443 call_started=9 calls_failed=8 state=READY>
client_app.py:292] [psm-grpc-client-6566595cff-8wrfd] ADS: Detected successful calls to xDS control plane trafficdirector.googleapis.com:443
```
Eliminate spam from clang-format script.
Also add a shuffle to the file list - saves about a second of runtime on
my machine by making sure clusters of hard files are distributed evenly
amongst CPUs.
This adds the directory reloader implementation of the CrlProvider. This
will periodically reload CRL files in a directory per [gRFC
A69](https://github.com/grpc/proposal/pull/382)
Included in this is the following:
* A public API to create the `DirectoryReloaderCrlProvider`
* A basic directory interface in gprpp and platform specific impls for
getting the list of files in a directory (unfortunately prior C++17,
there is no std::filesystem, so we have to have platform specific impls)
* The implementation of `DirectoryReloaderCrlProvider` takes an
event_engine and a directory interface. This allows us to test using the
fuzzing event engine for time mocking, and to implement a test directory
interface so we avoid having to make temporary directories and files in
the tests. This is notably not in `include`, and the
`CreateDirectoryReloaderCrlProvider` is the only way to construct one
from the public API, so we don't expose the event engine and directory
details to the user.
---------
Co-authored-by: gtcooke94 <gtcooke94@users.noreply.github.com>
This PR fixes a bug identified in #29667, where the TLS channel
credentials still require a trust bundle even if the user has explicitly
opted to not verify the server certificate. This PR is based on #29810.
This only affects pull requests for the `Bazel RBE Build Tests` job. The
equivalent master CI job will still build all end2end tests, including
experiments.
The `bazel_build_with_strict_warnings_linux` job was taking nearly an
hour to complete. This change brings it down to ~20m by splitting up the
build targets into 7 separate RBE actions.
---------
Co-authored-by: drfloob <drfloob@users.noreply.github.com>
This reduces clang-format duration to under 10s on my 128 core machine.
Previously I stopped the process at 10 minutes. Major changes:
Eliminates 18,000 files
```
find ... \
-and -not -path '*/cmake/build/*' \
-and -not -path '*/huffman_geometries/*' \
```
Also, this now runs `$CPU_COUNT` number of parallel clang-format
processes with a roughly equal number of files. Previously, this script
was formatting one file at a time in separate processes (`xargs -n 1`).
Add a `PodMonitoring` resource type to the PSM interop testing
framework. This is needed so that GMP (Google Managed Prometheus) can
scrape the matching GKE pods Prometheus endpoint for Prometheus metrics.
Relands #34785, which was reverted in #34818.
The first commit is the revert. The second commit removes the gtest
dependency from the xds_server library, which should address the
testonly problem internally.
<!--
If you know who should review your pull request, please assign it to
that
person, otherwise the pull request would get assigned randomly.
If your pull request is for a specific language, please add the
appropriate
lang label.
-->
Currently it is very easy to use the `TlsCredentialsOptions` in such a
way that it produces a memory leak. For example, the code block
```
{
TlsCredentialsOptions options;
}
```
produces a memory leak. This PR fixes up the ownership bugs in this
class and its `grpc_tls_credentials_options`, the C-core analogue.
In case of test fails, the clean up script will try delete some resource
we didn't create and resulting lots of 404 errors, we should exclude
those status code since we have specific handling for 404.
<!--
If you know who should review your pull request, please assign it to
that
person, otherwise the pull request would get assigned randomly.
If your pull request is for a specific language, please add the
appropriate
lang label.
-->
Earlier, the grpc message-length prefix for outgoing data messages was
incorrectly being counted towards `data_bytes` instead of
`framing_bytes`. This PR fixes it.
Note that the incoming stats collection properly attributes the grpc
message-length prefix to `framing_bytes`.
This change will affect all stats plugins (OpenCensus and OpenTelemetry)
that make use of this information for metrics.