Discovered via `bazel test
--test_env=GRPC_EXPERIMENTS=event_engine_client
//test/core/iomgr:endpoint_pair_test`. CI experiments can be enabled
generally on Windows once a few fixes and improvements are completed.
The `method_exists` function requires a fully qualified class name to be
sent to check if a method exists. The current class was missing the
namespace, which means the function always returns `false`. In our
application this caused the credentials to be loaded many times over,
which ate up some CPU. This bug fix ensures that this is only run once
per request.
This prevents deadlock against wire writer issues.
Currently, there are some `transport_stream_receiver_` callbacks
triggered by NDK binder may acquire `WireReaderImpl::mu_` first then
`WireWriterImpl::write_mu_`. We don't like see this.
We have this problem since some client and server are in the same
process. The behavior of NDK binder seems more aggressive when the Tx
and Rx are in the same process.
Follow-up to https://github.com/grpc/grpc/pull/32229.
https://github.com/grpc/grpc/pull/32229 incremented the `ExecCtx` count
unconditionally. It was previously impossible for a thread to exit
`IncExecCtxCount` while `fork_complete_` was `false`. These same threads
then went on to _decrement_ `count_` while the fork was still in
progress, putting `count_` well below its expected range ([0, 1] while
blocking and [2, inf) while not blocking). This resulted in cases where
`count_` would be stuck at a negative number with a thread infinitely
looping through `IncExecCtxCount`.
This PR instead opts EE threads out of ExecCtx counting entirely. They
handle clean-up of their threads separately through a separate set of
handlers registered by an entirely separate invocation of
`pthread_atfork`. This resolves the issue pointed out in [this
comment](https://github.com/grpc/grpc/issues/31885#issuecomment-1426445192).
There are potentially surprising deployment bugs that can cause `EMFILE`
to be hit. For example, file descriptor limits can be easily reached if
- the round robin LB policy is used
- the load balancer hands out an assignment with a lot of backends
- using debian's default 1024 file descriptor limit.
To make such problems more apparent, we can pay special attention to
this error and log ERROR when it happens.
Related: b/265199104
.
This adds an interop client / server for GCP Observability integration
testing.
Everything is new here with no refactor. Plan is to get this in first
before trying to refactor out the flags.
Avoids some compilation problems on older MSVC's, opens the door for
some future optimizations.
---------
Co-authored-by: ctiller <ctiller@users.noreply.github.com>
This test has been occasionally failing on CI with "Bus Error" crashes
while requiring the grpc shared library.
These crashes have been unreproducible locally. Let's continue debugging
(b/266212253) but skip this on CI.
Internal bug ref: b/271334230
<!--
If you know who should review your pull request, please assign it to
that
person, otherwise the pull request would get assigned randomly.
If your pull request is for a specific language, please add the
appropriate
lang label.
-->
Note that the `StartRetryTimerLocked` code path is not covered by test.
And I'm having trouble adding a test case to cover it. Seems like we
need the health service (default_health_check_service) to shutdown and
end the server-side streaming RPC (and send the trailing metadata) but
keep the subchannel (and subchannel_stream_client) alive so that it
could retry a Health.Watch RPC.
<!--
If you know who should review your pull request, please assign it to
that
person, otherwise the pull request would get assigned randomly.
If your pull request is for a specific language, please add the
appropriate
lang label.
-->
Further Grpc.Tools tests
- TestCharactersInName - proto files with dots and numbers in their
names
- TestExtraOptions - test setting AdditionalProtocArguments,
OutputOptions and GrpcOutputOptions
- TestGrpcServices - test setting GrpcServices to "none", "client",
"server" and "both"
- TestSetOutputDirs - test setting OutputDir and GrpcOutputDir
Upgrade boringssl to the latest "master-with-bazel"
- use the `'USE_HEADERMAP' => 'NO'` fix for ObjC
- update the key for asm optimizations on mac/apple in python's setup.py
This PR depends on monterey fixes here:
https://github.com/grpc/grpc/pull/32493 and the boringssl's build
simplification
https://boringssl-review.googlesource.com/c/boringssl/+/56465.
---------
Co-authored-by: Hannah Shi <hannahshisfb@gmail.com>
<!--
If you know who should review your pull request, please assign it to
that
person, otherwise the pull request would get assigned randomly.
If your pull request is for a specific language, please add the
appropriate
lang label.
-->
<!--
If you know who should review your pull request, please assign it to
that
person, otherwise the pull request would get assigned randomly.
If your pull request is for a specific language, please add the
appropriate
lang label.
-->
Make remaining objC jobs compatible with kokoro monterey workers and
prepare for boringssl upgrade.
The changes here are taken from https://github.com/grpc/grpc/pull/32357,
but they should be merged in a separate PR
(we need the changes to be able to upgrade to monterey anyway and
there's no reason to make the boringssl upgrade PR more complicated by
bundling more fixes into it).
I've checked that the grpc_basictests_objc_examples and
grpc_ios_binary_size are green if switched to monterey.
Unfortunately it's hard to make grpc_basictests_objc_examples pass on
both monterey and mojave, so I suggest merging this PR at the same time
as CL to upgrade the kokoro jobs to monterey.
- that way both PR and continuous runs will remain green
- older branches would need a backport anyway
---------
Co-authored-by: Hannah Shi <hannahshisfb@gmail.com>
AFAICT, travis hasn't been disabled for the grpc/grpc repository a long
time ago.
https://travis-ci.org/github/grpc/grpchttps://travis-ci.com/github/grpc/grpc
I'm happy to be proven wrong if travis is still being used, but if not,
we should delete the travis config to avoid confusing people (I've seen
some recently open PRs attempting to make modification to this file).
Alongside https://github.com/grpc/grpc/pull/32496, this makes this test
behave the same on all platforms.
FWIW, I verified this causes us to see the previous lock cycle problem
in https://github.com/grpc/grpc/pull/32491 on linux - originally that
lock cycle was only on mac, because of environmental differences between
mac and linux in CI.
<!--
If you know who should review your pull request, please assign it to
that
person, otherwise the pull request would get assigned randomly.
If your pull request is for a specific language, please add the
appropriate
lang label.
-->
This compiles for //:grpc, but not for tests yet.
It's the right approach though - @veblush hoping this is something you
can pick up and finish off.
<!--
If you know who should review your pull request, please assign it to
that
person, otherwise the pull request would get assigned randomly.
If your pull request is for a specific language, please add the
appropriate
lang label.
-->
This fixes a very strange TSAN flake seen in an internal test
(b/268292646), which seems to have been introduced in
https://github.com/grpc/grpc/pull/32326.
I've included an example of the TSAN failure below, for future
reference. I don't fully understand what is causing this failure. It
looks like the async callback from a previous picker update caused the
pick to be re-queued but then got stuck draining queued callbacks in its
`ExecCtx` instance, while the async callback from the next picker update
had already failed the call. But the part I don't understand is why the
call combiner cancellation callback wound up scheduled on the `ExecCtx`
instance from the first thread -- that seems likely to be caused by
either the `WorkSerializer` or maybe `ExecCtx` work-stealing, but the
exact details are likely to be hard to nail down.
Switching from `EventEngine::Run()` back to `ExecCtx::Run()` seems to
fix the flake, so that's what I'm doing in this PR. I find this
work-around deeply dissatisfying, especially since I do not fully
understand the root cause of the problem. However, given that we are
working toward eliminating the `FilterBasedLoadBalancedCall` code
entirely as part of the promise conversion, this problem is probably not
worth further investigation.
```
WARNING: ThreadSanitizer: data race (pid=7406)
Read of size 8 at 0x7b8800037558 by thread T47:
#0
main testing/base/internal/gunit_main.cc:86:10 (c0da98875dfb74bd2cfcbd76306f382f2d05e1326f778b3a4cefdef1a42a75e7_020000b94930+0x2aa83a)
Thread T40 'EventManager_Default' (tid=7447, running) created by main thread at:
#0
main testing/base/internal/gunit_main.cc:86:10 (c0da98875dfb74bd2cfcbd76306f382f2d05e1326f778b3a4cefdef1a42a75e7_020000b94930+0x2aa83a)
SUMMARY: ThreadSanitizer: data race third_party/grpc/src/core/ext/filters/client_channel/client_channel.cc:2780:5 in grpc_core::ClientChannel::FilterBasedLoadBalancedCall::~FilterBasedLoadBalancedCall()
```
1. All greeter servers now support flag `--port` to customize the listen
port.
2. All client implementation now have `--target` flag to specify the
server location.
3. ABSL is used to parse the flags to reduce the amount of boilerplate
code in the examples.
Fixes: #26989
The docs change is extracted from
https://github.com/grpc/grpc/pull/31869 and
https://github.com/grpc/grpc/pull/31938.
The actual upgrade of boringssl is in progress, but in the meantime we
can at least make sure the instructions are up-to-date.
I'll also update the internal counterpart (cl/501499368)
Co-authored-by: Hannah Shi <hannahshisfb@gmail.com>
This filter was originally written only for the C++ wrapped layer, but
we have plans to use this for Python (and maybe other wrapped languages
too in the future.)
<!--
If you know who should review your pull request, please assign it to
that
person, otherwise the pull request would get assigned randomly.
If your pull request is for a specific language, please add the
appropriate
lang label.
-->
Looks like this was accidentally dropped from our build files in
https://github.com/grpc/grpc/pull/21929, which means that this test
hasn't actually been built or run in almost 3 years. Unsurprisingly
after all that time, I had to make some changes to the test to get it to
actually build.
I've replaced all use of `InternalError` here because none of these
scenarios would necessarily merit a bug or outage report.
Identified in the fuchsia test suite: calling the Listener's
`on_shutdown` method with anything other than `absl::OkStatus()` would
fail some assertions in the Posix-specialized client test suite if the
Oracle were implemented similarly. It _should_ fail the same way in the
listener test suite, but the statuses are ignored. I've fixed that.
<!--
If you know who should review your pull request, please assign it to
that
person, otherwise the pull request would get assigned randomly.
If your pull request is for a specific language, please add the
appropriate
lang label.
-->
This fixes the problems identified while building with clang-cl on
Windows, with build arguments `/std:c++14 /W4`
Passes internal checks: cl/511562057
----
We can't yet enable a clang-cl build as part of our continuous
integration tests due to a few issues:
protobuf fails an `unused-parameter` warning check in v4.21 (the current
pinned version) on Windows. The upgrade to v4.22 is evidently painful
and in progress. Without maintaining a patch against protobuf, or
disabling warnings-as-errors somehow for the protobuf code alone, we'll
need to upgrade our dependency before we can automate the clang-cl build
for Windows.
Next, our Windows CI environment does not have clang installed. There
has been some work over the past year to create custom kokoro images,
but that work has apparently stalled after trading hands a few times.
Using our current images, installing clang every time we run the job may
be our best bet (likely from precompiled binaries that we host
ourselves), but it will eat up more CI resources.
Finally, some of the default build configurations are incorrect for
clang-cl. For example `-Wall` in clang-cl translates roughly to
`-Weverything` in clang linux, whereas `-W4` in clang-cl translates more
closely to `-Wall -Wextra`. This configuration in the gRPC bazel build
is not currently platform-specific, it will need to be updated.
Similarly, `-std=c++14` is an unknown argument on Windows (should be
`/std:c++14`), and should not be in the bazelrc. This will likely need
the same platform-specific support.
With the `--copt="-std=c++14"` setting in the bazel.rc file as it is
today, MSVC builds have complained for every cc file:
```
cl : Command line warning D9002 : ignoring unknown option '-std=c++14'
```
This adds thousands of lines of noise to Windows builds, and hides
useful warnings. Using the `/std:c++14` flag on MSVC (and clang-cl) gets
us the desired result.