PSM Interop: Local dev various improvements
- Cleanup resources on ctrl+c
- Add startup probes to address the issue with port forwarding starting
before the workload listens on a port
- Remove misleading restartPolicy: it's silently ignored by k8s
- Extra debug message with port-forwarding command
<!--
If you know who should review your pull request, please assign it to
that
person, otherwise the pull request would get assigned randomly.
If your pull request is for a specific language, please add the
appropriate
lang label.
-->
<!--
If you know who should review your pull request, please assign it to
that
person, otherwise the pull request would get assigned randomly.
If your pull request is for a specific language, please add the
appropriate
lang label.
-->
- sort source files to ensure stable ordering
- generate one source file per line
together this should produce diffs that are much more readable by humans
when sources get added/removed to/from protobuf (and
make_grpcio_tools.py is used to regenerate).
First step in the modernization of our RBE stack (see
go/rbe-tech-debt-notes).
- Get rid of the deprecated rbe_autoconfig and start using
[rbe_configs_gen](https://github.com/bazelbuild/bazel-toolchains#rbe_configs_gen---cli-tool-to-generate-configs)
+ check in the generated toolchain configs.
- Switch from marketplace.gcr.io/google/rbe-ubuntu16-04 to
marketplace.gcr.io/google/rbe-ubuntu18-04 (this image is still not owned
by us, but at least it's newer and demonstrates how a switch to a newer
docker image is done).
- provide script for generating the linux RBE toolchain configs.
- cleanup RBE configuration in the bazelrc files used for remote build
This check only works if all handshake RPCs have an OK status, and it's
racey e.g. if the client is cancelling handshake RPCs (being when an RPC
is cancelled, termination of the RPC at the client is asynchronous from
termination at the server, so the client can resume the queue before the
server RPC completes).
<!--
If you know who should review your pull request, please assign it to
that
person, otherwise the pull request would get assigned randomly.
If your pull request is for a specific language, please add the
appropriate
lang label.
-->
With iomgr, this test is effectively rate limited by ExecCtx and the
single thread running pollset_work, which results in thousands of tiny
writes happening before every read. A small set of _synchronous_ 8k
reads then dominate the read-side of the test. This is an efficient
balance.
With the Windows EventEngine, the fully asynchronous, multi-threaded
reads and writes end up alternating roughly 1:1, meaning that a read
callback is executed for every tiny handful of bytes, tens of thousands
of times. Compared to the Posix EventEngine, without things like TCP_INQ
and/or recvmsg's timeout, I don't know of any great signal for how much
data can safely be received in a batch (e.g., we don't want to wait for
data that will never come, and we don't want to run callbacks for 2
bytes over and over again if we have KB in the pipe).
I believe the Windows EventEngine is WAI. I can significantly improve
this test performance by artificially slowing the reader down (adding a
>= 1ms sleep), but I believe that improves this use case to the
detriment of all others.
This fixes a bug where connections cannot be made in IPv4-only
environments. To test, hard-code `IsIpv6LoopbackAvailable` to return
false.
Example Error:
`
D0309 00:29:49.514359445 235 tcp_client.cc:67] (event_engine)
EventEngine::Connect Status: INTERNAL: socket: Address family not
supported by protocol
`
This can also be reproduced in gRPC's benchmark environment, which does
not have IPv6 enabled.
<!--
If you know who should review your pull request, please assign it to
that
person, otherwise the pull request would get assigned randomly.
If your pull request is for a specific language, please add the
appropriate
lang label.
-->
<!--
If you know who should review your pull request, please assign it to
that
person, otherwise the pull request would get assigned randomly.
If your pull request is for a specific language, please add the
appropriate
lang label.
-->
To support TPC feature for BYOID (3PI), we need to remove the validation
the pattern of impersonation endpoints, sts endpoints and token info
endpoints since they are different in TPC regions.
A security review is already passed at b/261634871
<!--
If you know who should review your pull request, please assign it to
that
person, otherwise the pull request would get assigned randomly.
If your pull request is for a specific language, please add the
appropriate
lang label.
-->
When the handshaker_service_url is in "host:port" format such as it
normally is when using ALTS in GCE (in which case it comes from then
this makes no difference as the authority and the URL are the same. But
when different URLs are used, the correct authority to use is not always
the same as the URL. For example if the URL is unix:///some/path then
the correct authority is "localhost". This is correctly computed by
grpc_core::UnixResolverFactory and stored as the channel's default
authority, but we throw that away when we override the authority for
individual RPCs.
Note indeed that the majority of other callers of grpc_channel_create_*
pass nullptr for the host/authority argument.
It looks like nobody ever created ALTS redentials from Python with a
list of accepted service accounts before.
Simple reproduction:
```
import grpc
grpc.alts_channel_credentials(None) # works
grpc.alts_channel_credentials(['foo']) # fails
```
Without this change, generates this error:
```
[...]
File "src/python/grpcio/grpc/_cython/_cygrpc/credentials.pyx.pxi", line 414, in _cython.cygrpc.channel_credentials_alts
File "src/python/grpcio/grpc/_cython/_cygrpc/credentials.pyx.pxi", line 403, in _cython.cygrpc.ALTSChannelCredentials.__cinit__
TypeError: expected bytes, str found
```
(And the error cannot be worked around by the caller by passing a bytes
object from the Python side: you still get the same error.)
PR #32215 added the verified root cert subject to the lower level
`tsi_peer`. This PR is a companion to that and completes the feature by
bubbling the information up to the `TsiCustomVerificationCheckRequest`
which is part of the user facing API for implementing custom
verification callbacks.
<!--
If you know who should review your pull request, please assign it to
that
person, otherwise the pull request would get assigned randomly.
If your pull request is for a specific language, please add the
appropriate
lang label.
-->
Discovered via `bazel test
--test_env=GRPC_EXPERIMENTS=event_engine_client
//test/core/iomgr:endpoint_pair_test`. CI experiments can be enabled
generally on Windows once a few fixes and improvements are completed.
The `method_exists` function requires a fully qualified class name to be
sent to check if a method exists. The current class was missing the
namespace, which means the function always returns `false`. In our
application this caused the credentials to be loaded many times over,
which ate up some CPU. This bug fix ensures that this is only run once
per request.
This prevents deadlock against wire writer issues.
Currently, there are some `transport_stream_receiver_` callbacks
triggered by NDK binder may acquire `WireReaderImpl::mu_` first then
`WireWriterImpl::write_mu_`. We don't like see this.
We have this problem since some client and server are in the same
process. The behavior of NDK binder seems more aggressive when the Tx
and Rx are in the same process.
Follow-up to https://github.com/grpc/grpc/pull/32229.
https://github.com/grpc/grpc/pull/32229 incremented the `ExecCtx` count
unconditionally. It was previously impossible for a thread to exit
`IncExecCtxCount` while `fork_complete_` was `false`. These same threads
then went on to _decrement_ `count_` while the fork was still in
progress, putting `count_` well below its expected range ([0, 1] while
blocking and [2, inf) while not blocking). This resulted in cases where
`count_` would be stuck at a negative number with a thread infinitely
looping through `IncExecCtxCount`.
This PR instead opts EE threads out of ExecCtx counting entirely. They
handle clean-up of their threads separately through a separate set of
handlers registered by an entirely separate invocation of
`pthread_atfork`. This resolves the issue pointed out in [this
comment](https://github.com/grpc/grpc/issues/31885#issuecomment-1426445192).
There are potentially surprising deployment bugs that can cause `EMFILE`
to be hit. For example, file descriptor limits can be easily reached if
- the round robin LB policy is used
- the load balancer hands out an assignment with a lot of backends
- using debian's default 1024 file descriptor limit.
To make such problems more apparent, we can pay special attention to
this error and log ERROR when it happens.
Related: b/265199104
Third try for #32466.
This adds an interop client / server for GCP Observability integration
testing.
Everything is new here with no refactor. Plan is to get this in first
before trying to refactor out the flags.
Avoids some compilation problems on older MSVC's, opens the door for
some future optimizations.
---------
Co-authored-by: ctiller <ctiller@users.noreply.github.com>
This test has been occasionally failing on CI with "Bus Error" crashes
while requiring the grpc shared library.
These crashes have been unreproducible locally. Let's continue debugging
(b/266212253) but skip this on CI.
Internal bug ref: b/271334230
<!--
If you know who should review your pull request, please assign it to
that
person, otherwise the pull request would get assigned randomly.
If your pull request is for a specific language, please add the
appropriate
lang label.
-->
Note that the `StartRetryTimerLocked` code path is not covered by test.
And I'm having trouble adding a test case to cover it. Seems like we
need the health service (default_health_check_service) to shutdown and
end the server-side streaming RPC (and send the trailing metadata) but
keep the subchannel (and subchannel_stream_client) alive so that it
could retry a Health.Watch RPC.
<!--
If you know who should review your pull request, please assign it to
that
person, otherwise the pull request would get assigned randomly.
If your pull request is for a specific language, please add the
appropriate
lang label.
-->
Further Grpc.Tools tests
- TestCharactersInName - proto files with dots and numbers in their
names
- TestExtraOptions - test setting AdditionalProtocArguments,
OutputOptions and GrpcOutputOptions
- TestGrpcServices - test setting GrpcServices to "none", "client",
"server" and "both"
- TestSetOutputDirs - test setting OutputDir and GrpcOutputDir
Upgrade boringssl to the latest "master-with-bazel"
- use the `'USE_HEADERMAP' => 'NO'` fix for ObjC
- update the key for asm optimizations on mac/apple in python's setup.py
This PR depends on monterey fixes here:
https://github.com/grpc/grpc/pull/32493 and the boringssl's build
simplification
https://boringssl-review.googlesource.com/c/boringssl/+/56465.
---------
Co-authored-by: Hannah Shi <hannahshisfb@gmail.com>
<!--
If you know who should review your pull request, please assign it to
that
person, otherwise the pull request would get assigned randomly.
If your pull request is for a specific language, please add the
appropriate
lang label.
-->
Refactor C++ interop test client flags into the common
`client_helper.h/cc`. This is needed by the observability testing PR
#32466
We need the `ABSL_DECLARE_FLAG` in the header file so that we can share
that across different implementation.
<!--
If you know who should review your pull request, please assign it to
that
person, otherwise the pull request would get assigned randomly.
If your pull request is for a specific language, please add the
appropriate
lang label.
-->
Make remaining objC jobs compatible with kokoro monterey workers and
prepare for boringssl upgrade.
The changes here are taken from https://github.com/grpc/grpc/pull/32357,
but they should be merged in a separate PR
(we need the changes to be able to upgrade to monterey anyway and
there's no reason to make the boringssl upgrade PR more complicated by
bundling more fixes into it).
I've checked that the grpc_basictests_objc_examples and
grpc_ios_binary_size are green if switched to monterey.
Unfortunately it's hard to make grpc_basictests_objc_examples pass on
both monterey and mojave, so I suggest merging this PR at the same time
as CL to upgrade the kokoro jobs to monterey.
- that way both PR and continuous runs will remain green
- older branches would need a backport anyway
---------
Co-authored-by: Hannah Shi <hannahshisfb@gmail.com>
AFAICT, travis hasn't been disabled for the grpc/grpc repository a long
time ago.
https://travis-ci.org/github/grpc/grpchttps://travis-ci.com/github/grpc/grpc
I'm happy to be proven wrong if travis is still being used, but if not,
we should delete the travis config to avoid confusing people (I've seen
some recently open PRs attempting to make modification to this file).
Alongside https://github.com/grpc/grpc/pull/32496, this makes this test
behave the same on all platforms.
FWIW, I verified this causes us to see the previous lock cycle problem
in https://github.com/grpc/grpc/pull/32491 on linux - originally that
lock cycle was only on mac, because of environmental differences between
mac and linux in CI.