Discovered via `bazel test
--test_env=GRPC_EXPERIMENTS=event_engine_client
//test/core/iomgr:endpoint_pair_test`. CI experiments can be enabled
generally on Windows once a few fixes and improvements are completed.
There are potentially surprising deployment bugs that can cause `EMFILE`
to be hit. For example, file descriptor limits can be easily reached if
- the round robin LB policy is used
- the load balancer hands out an assignment with a lot of backends
- using debian's default 1024 file descriptor limit.
To make such problems more apparent, we can pay special attention to
this error and log ERROR when it happens.
Related: b/265199104
Third try for #32466.
This adds an interop client / server for GCP Observability integration
testing.
Everything is new here with no refactor. Plan is to get this in first
before trying to refactor out the flags.
Avoids some compilation problems on older MSVC's, opens the door for
some future optimizations.
---------
Co-authored-by: ctiller <ctiller@users.noreply.github.com>
Refactor C++ interop test client flags into the common
`client_helper.h/cc`. This is needed by the observability testing PR
#32466
We need the `ABSL_DECLARE_FLAG` in the header file so that we can share
that across different implementation.
<!--
If you know who should review your pull request, please assign it to
that
person, otherwise the pull request would get assigned randomly.
If your pull request is for a specific language, please add the
appropriate
lang label.
-->
Alongside https://github.com/grpc/grpc/pull/32496, this makes this test
behave the same on all platforms.
FWIW, I verified this causes us to see the previous lock cycle problem
in https://github.com/grpc/grpc/pull/32491 on linux - originally that
lock cycle was only on mac, because of environmental differences between
mac and linux in CI.
<!--
If you know who should review your pull request, please assign it to
that
person, otherwise the pull request would get assigned randomly.
If your pull request is for a specific language, please add the
appropriate
lang label.
-->
This compiles for //:grpc, but not for tests yet.
It's the right approach though - @veblush hoping this is something you
can pick up and finish off.
<!--
If you know who should review your pull request, please assign it to
that
person, otherwise the pull request would get assigned randomly.
If your pull request is for a specific language, please add the
appropriate
lang label.
-->
This filter was originally written only for the C++ wrapped layer, but
we have plans to use this for Python (and maybe other wrapped languages
too in the future.)
<!--
If you know who should review your pull request, please assign it to
that
person, otherwise the pull request would get assigned randomly.
If your pull request is for a specific language, please add the
appropriate
lang label.
-->
Looks like this was accidentally dropped from our build files in
https://github.com/grpc/grpc/pull/21929, which means that this test
hasn't actually been built or run in almost 3 years. Unsurprisingly
after all that time, I had to make some changes to the test to get it to
actually build.
I've replaced all use of `InternalError` here because none of these
scenarios would necessarily merit a bug or outage report.
Identified in the fuchsia test suite: calling the Listener's
`on_shutdown` method with anything other than `absl::OkStatus()` would
fail some assertions in the Posix-specialized client test suite if the
Oracle were implemented similarly. It _should_ fail the same way in the
listener test suite, but the statuses are ignored. I've fixed that.
<!--
If you know who should review your pull request, please assign it to
that
person, otherwise the pull request would get assigned randomly.
If your pull request is for a specific language, please add the
appropriate
lang label.
-->
I had some doubts about `Seq` debugging another problem, so expanded the
tests we have to try and isolate the problem (so far without success, so
I think the original problem was elsewhere).
<!--
If you know who should review your pull request, please assign it to
that
person, otherwise the pull request would get assigned randomly.
If your pull request is for a specific language, please add the
appropriate
lang label.
-->
For stats, the StackDriver/OpenCensus API allows setting the
MonitoredResource directly, so use that.
For tracing, there is no explicit MonitoredResource to use, so just
insert it into the attributes for a span.
<!--
If you know who should review your pull request, please assign it to
that
person, otherwise the pull request would get assigned randomly.
If your pull request is for a specific language, please add the
appropriate
lang label.
-->
Enforce a minimum value for the `refresh_interval_sec_` for the
`FileWatcherCertificateProvider`. There have been issues found when this
is set to 0, and the security team discussed and agreed that 0 should
not be a valid value for this use-case.
I made the `refresh_interval_sec_` public to make it easy to test - I
didn't immediately see an easy way around this. I found `FRIEND_TEST`
exists for accessing private members, but I didn't see that used
anywhere in grpc. If there is a better solution to this, please let me
know.
This test is flaky only with iomgr, this fix will likely fix this.
<!--
If you know who should review your pull request, please assign it to
that
person, otherwise the pull request would get assigned randomly.
If your pull request is for a specific language, please add the
appropriate
lang label.
-->
Relands #32385 (reverted in #32419) with fixes.
The Windows build is clean on a test cherrypick: cl/511291828
<!--
If you know who should review your pull request, please assign it to
that
person, otherwise the pull request would get assigned randomly.
If your pull request is for a specific language, please add the
appropriate
lang label.
-->
---------
Co-authored-by: drfloob <drfloob@users.noreply.github.com>
Return `Timeout(kMaxHours, Unit::kHours)` if the value is about to
overflow in `DivideRoundingUp`.
<!--
If you know who should review your pull request, please assign it to
that
person, otherwise the pull request would get assigned randomly.
If your pull request is for a specific language, please add the
appropriate
lang label.
-->
The `XdsFaultInjectionMaxFault` test has seen a few flakes since #32326
was merged. I believe the flakiness is caused by the fact that when a
large number of RPCs are queued up before the resolver result comes in,
those RPCs are now re-processed in parallel instead of sequentially,
which can cause us to delay more RPCs than we should due to the
`max_faults` setting. To fix this, we change the test to ensure that the
channel is connected (i.e., the resolver result has already been
returned) before we start sending a large number of concurrent RPCs.
Although this is the only test that I've seen flakes in, I've made this
same change consistently to all fault injection tests that are creating
a large number of concurrent RPCs, since the same flake could affect any
of them.
This code is not plumbed through yet, but it provides the core
infrastructure needed to detect the proper GCP environment resources
needed to set up the labels/attributes/resources for stats, tracing and
logging.
Details on how the various environment resources are setup has been
derived by looking at java's cloud logging library and OpenTelemetry's
future plans. (Could be better explained in an offline review since some
links are internal).
Requesting @veblush for a full review and @markdroth for a structural
review.
<!--
If you know who should review your pull request, please assign it to
that
person, otherwise the pull request would get assigned randomly.
If your pull request is for a specific language, please add the
appropriate
lang label.
-->
<!--
If you know who should review your pull request, please assign it to
that
person, otherwise the pull request would get assigned randomly.
If your pull request is for a specific language, please add the
appropriate
lang label.
-->
---------
Co-authored-by: ctiller <ctiller@users.noreply.github.com>
A handful of problems were identified while writing the
WindowsEventEngine Listener. To make the listener review easier, these
fixes can be landed separately.
This is built upon https://github.com/grpc/grpc/pull/32376
Problems that are fixed in this PR:
* `OnConnectCompleted` held a Mutex while calling the user callback,
which can deadlock.
* The WinSocket and some associated data needs to remain alive after the
Endpoint destroyed, since Windows IOCP still needs to use some of that
data. Endpoint destruction and socket shutdown are now decoupled, with
the socket managed by a shared_ptr.
<!--
If you know who should review your pull request, please assign it to
that
person, otherwise the pull request would get assigned randomly.
If your pull request is for a specific language, please add the
appropriate
lang label.
-->
---------
Co-authored-by: drfloob <drfloob@users.noreply.github.com>
There were some rollback conflicts, so this isn't a pure rollback.
This reverts commit ba0e55f539.
<!--
If you know who should review your pull request, please assign it to
that
person, otherwise the pull request would get assigned randomly.
If your pull request is for a specific language, please add the
appropriate
lang label.
-->
Cleanup and remove ios cpp test cronet
To test manually:
./tools/bazel test //src/objective-c/tests:CppCronetTests
@sampajano
<!--
If you know who should review your pull request, please assign it to
that
person, otherwise the pull request would get assigned randomly.
If your pull request is for a specific language, please add the
appropriate
lang label.
-->
<!--
If you know who should review your pull request, please assign it to
that
person, otherwise the pull request would get assigned randomly.
If your pull request is for a specific language, please add the
appropriate
lang label.
-->
---------
Co-authored-by: ctiller <ctiller@users.noreply.github.com>
Rollforward #32346 with some fixes in
1e88193edd
<!--
If you know who should review your pull request, please assign it to
that
person, otherwise the pull request would get assigned randomly.
If your pull request is for a specific language, please add the
appropriate
lang label.
-->
To be merged after #31448#32110#32094
<!--
If you know who should review your pull request, please assign it to
that
person, otherwise the pull request would get assigned randomly.
If your pull request is for a specific language, please add the
appropriate
lang label.
-->
---------
Co-authored-by: ctiller <ctiller@users.noreply.github.com>
The pooled allocator currently has an ABA issue in the allocation path.
This change should fix that - algorithm is described reasonably well in
the PR.
<!--
If you know who should review your pull request, please assign it to
that
person, otherwise the pull request would get assigned randomly.
If your pull request is for a specific language, please add the
appropriate
lang label.
-->
* [flake] Fix max connection age
If the thread sending the request gets descheduled for too long (suppose
CI is under duress!) then the request will not get sent before max
connection age hits, and we'll see the client request fail *without*
reaching the server.
* further fix
* Add info about ca cert used to verify chain.
The tsi_peer object will now contain the subject of the root/ca cert
that was used to verify the peer's chain during a handshake.
* temp investigation
* Fix issues relating to overlapping CRL callback
* formatting on ssl_transport_security.cc
* Swap ca_cert naming
* Use preverify_ok instead of numbers
* Continue some renaming, addressing pr comments
* Removed early return if peer property setting fails
* Continue renaming
* clang-tidy
* Fix clang problem
* clang fixes
* Add null check in tests
* More PR changes. Behavior change to include root cert extract when TSI_REQUEST_CLIENT_CERTIFICATE_AND_VERIFY
* Add intermediate ca, leaf cert, and test with them
* clang-tidy
* Basic formatting
* Add new keys to build for export
* Add new cert files to test BUILD
* build file style fix
* changes for chain test
* clang-format
* build clean
* Add $ to lines of code in README
* Add directive about X509_STORE_CTX_get0_chain
* formatting
These tests are failing because they're running with too few threads,
however if we give them sufficient threads to catch bugs they're flaky.
Remove them and get the team some bandwidth back.
* [http] Dont drop connections on metadata limit exceeded
* remove bad test
* Automated change: Fix sanity tests
---------
Co-authored-by: ctiller <ctiller@users.noreply.github.com>
* Revert "Revert "Revert "Revert "server: introduce ServerMetricRecorder API and move per-call reporting from a C++ interceptor to a C-core filter (#32106)" (#32272)" (#32279)" (#32293)"
This reverts commit 1f960697c5.
* Do not create CallMetricRecorder if call is null.