Discovered via `bazel test
--test_env=GRPC_EXPERIMENTS=event_engine_client
//test/core/iomgr:endpoint_pair_test`. CI experiments can be enabled
generally on Windows once a few fixes and improvements are completed.
There are potentially surprising deployment bugs that can cause `EMFILE`
to be hit. For example, file descriptor limits can be easily reached if
- the round robin LB policy is used
- the load balancer hands out an assignment with a lot of backends
- using debian's default 1024 file descriptor limit.
To make such problems more apparent, we can pay special attention to
this error and log ERROR when it happens.
Related: b/265199104
Avoids some compilation problems on older MSVC's, opens the door for
some future optimizations.
---------
Co-authored-by: ctiller <ctiller@users.noreply.github.com>
<!--
If you know who should review your pull request, please assign it to
that
person, otherwise the pull request would get assigned randomly.
If your pull request is for a specific language, please add the
appropriate
lang label.
-->
Alongside https://github.com/grpc/grpc/pull/32496, this makes this test
behave the same on all platforms.
FWIW, I verified this causes us to see the previous lock cycle problem
in https://github.com/grpc/grpc/pull/32491 on linux - originally that
lock cycle was only on mac, because of environmental differences between
mac and linux in CI.
This compiles for //:grpc, but not for tests yet.
It's the right approach though - @veblush hoping this is something you
can pick up and finish off.
<!--
If you know who should review your pull request, please assign it to
that
person, otherwise the pull request would get assigned randomly.
If your pull request is for a specific language, please add the
appropriate
lang label.
-->
Looks like this was accidentally dropped from our build files in
https://github.com/grpc/grpc/pull/21929, which means that this test
hasn't actually been built or run in almost 3 years. Unsurprisingly
after all that time, I had to make some changes to the test to get it to
actually build.
I've replaced all use of `InternalError` here because none of these
scenarios would necessarily merit a bug or outage report.
Identified in the fuchsia test suite: calling the Listener's
`on_shutdown` method with anything other than `absl::OkStatus()` would
fail some assertions in the Posix-specialized client test suite if the
Oracle were implemented similarly. It _should_ fail the same way in the
listener test suite, but the statuses are ignored. I've fixed that.
I had some doubts about `Seq` debugging another problem, so expanded the
tests we have to try and isolate the problem (so far without success, so
I think the original problem was elsewhere).
<!--
If you know who should review your pull request, please assign it to
that
person, otherwise the pull request would get assigned randomly.
If your pull request is for a specific language, please add the
appropriate
lang label.
-->
Enforce a minimum value for the `refresh_interval_sec_` for the
`FileWatcherCertificateProvider`. There have been issues found when this
is set to 0, and the security team discussed and agreed that 0 should
not be a valid value for this use-case.
I made the `refresh_interval_sec_` public to make it easy to test - I
didn't immediately see an easy way around this. I found `FRIEND_TEST`
exists for accessing private members, but I didn't see that used
anywhere in grpc. If there is a better solution to this, please let me
know.
Relands #32385 (reverted in #32419) with fixes.
The Windows build is clean on a test cherrypick: cl/511291828
<!--
If you know who should review your pull request, please assign it to
that
person, otherwise the pull request would get assigned randomly.
If your pull request is for a specific language, please add the
appropriate
lang label.
-->
---------
Co-authored-by: drfloob <drfloob@users.noreply.github.com>
Return `Timeout(kMaxHours, Unit::kHours)` if the value is about to
overflow in `DivideRoundingUp`.
<!--
If you know who should review your pull request, please assign it to
that
person, otherwise the pull request would get assigned randomly.
If your pull request is for a specific language, please add the
appropriate
lang label.
-->
<!--
If you know who should review your pull request, please assign it to
that
person, otherwise the pull request would get assigned randomly.
If your pull request is for a specific language, please add the
appropriate
lang label.
-->
---------
Co-authored-by: ctiller <ctiller@users.noreply.github.com>
A handful of problems were identified while writing the
WindowsEventEngine Listener. To make the listener review easier, these
fixes can be landed separately.
This is built upon https://github.com/grpc/grpc/pull/32376
Problems that are fixed in this PR:
* `OnConnectCompleted` held a Mutex while calling the user callback,
which can deadlock.
* The WinSocket and some associated data needs to remain alive after the
Endpoint destroyed, since Windows IOCP still needs to use some of that
data. Endpoint destruction and socket shutdown are now decoupled, with
the socket managed by a shared_ptr.
<!--
If you know who should review your pull request, please assign it to
that
person, otherwise the pull request would get assigned randomly.
If your pull request is for a specific language, please add the
appropriate
lang label.
-->
---------
Co-authored-by: drfloob <drfloob@users.noreply.github.com>
There were some rollback conflicts, so this isn't a pure rollback.
This reverts commit ba0e55f539.
<!--
If you know who should review your pull request, please assign it to
that
person, otherwise the pull request would get assigned randomly.
If your pull request is for a specific language, please add the
appropriate
lang label.
-->
<!--
If you know who should review your pull request, please assign it to
that
person, otherwise the pull request would get assigned randomly.
If your pull request is for a specific language, please add the
appropriate
lang label.
-->
---------
Co-authored-by: ctiller <ctiller@users.noreply.github.com>
Rollforward #32346 with some fixes in
1e88193edd
<!--
If you know who should review your pull request, please assign it to
that
person, otherwise the pull request would get assigned randomly.
If your pull request is for a specific language, please add the
appropriate
lang label.
-->
To be merged after #31448#32110#32094
<!--
If you know who should review your pull request, please assign it to
that
person, otherwise the pull request would get assigned randomly.
If your pull request is for a specific language, please add the
appropriate
lang label.
-->
---------
Co-authored-by: ctiller <ctiller@users.noreply.github.com>
The pooled allocator currently has an ABA issue in the allocation path.
This change should fix that - algorithm is described reasonably well in
the PR.
<!--
If you know who should review your pull request, please assign it to
that
person, otherwise the pull request would get assigned randomly.
If your pull request is for a specific language, please add the
appropriate
lang label.
-->
* [flake] Fix max connection age
If the thread sending the request gets descheduled for too long (suppose
CI is under duress!) then the request will not get sent before max
connection age hits, and we'll see the client request fail *without*
reaching the server.
* further fix
* Add info about ca cert used to verify chain.
The tsi_peer object will now contain the subject of the root/ca cert
that was used to verify the peer's chain during a handshake.
* temp investigation
* Fix issues relating to overlapping CRL callback
* formatting on ssl_transport_security.cc
* Swap ca_cert naming
* Use preverify_ok instead of numbers
* Continue some renaming, addressing pr comments
* Removed early return if peer property setting fails
* Continue renaming
* clang-tidy
* Fix clang problem
* clang fixes
* Add null check in tests
* More PR changes. Behavior change to include root cert extract when TSI_REQUEST_CLIENT_CERTIFICATE_AND_VERIFY
* Add intermediate ca, leaf cert, and test with them
* clang-tidy
* Basic formatting
* Add new keys to build for export
* Add new cert files to test BUILD
* build file style fix
* changes for chain test
* clang-format
* build clean
* Add $ to lines of code in README
* Add directive about X509_STORE_CTX_get0_chain
* formatting
* [http] Dont drop connections on metadata limit exceeded
* remove bad test
* Automated change: Fix sanity tests
---------
Co-authored-by: ctiller <ctiller@users.noreply.github.com>
* [channel_args] Use c++ channel args during channel init
Previously we were converting to C and then back to C++ for each
filter... this ought to save some CPU time during connection
establishment.
* Automated change: Fix sanity tests
* cpp channel filters
* Automated change: Fix sanity tests
* iwyu
---------
Co-authored-by: ctiller <ctiller@users.noreply.github.com>
* Revert "Revert "xds_override_host LB: add support for draining state (#31985)" (#32189)"
This reverts commit 5222eafdc1.
* fixup: Add missing dependency to grpc_xds_client targetZ
* WRR: port StaticStrideScheduler to OSS
* WIP
* Automated change: Fix sanity tests
* fix build
* remove unused aliases
* fix another type mismatch
* remove unnecessary include
* move benchmarks to their own file, and don't run it on windows
* Automated change: Fix sanity tests
* add OOB reporting
* generate_projects
* clang-format
* add config parser test
* clang-tidy and minimize lock contention
* add config defaults
* add oob_reporting_period config field and add basic test
* Automated change: Fix sanity tests
* fix test
* change test to use basic RR
* WIP: started exposing peer address to LB policy API
* first WRR test passing!
* small cleanup
* port RR fix to WRR
* test helper refactoring
* more test helper refactoring
* WIP: trying to fix test to have the right weights
* more WIP -- need to make pickers DualRefCounted
* fix timer ref handling and get tests working
* clang-format
* iwyu and generate_projects
* fix build
* add test for OOB reporting
* keep only READY subchannels in the picker
* add file missed in a previous commit
* fix sanity
* iwyu
* add weight expiration period
* add tests for weight update period and OOB reporting period
* Automated change: Fix sanity tests
* lower bound for timer interval
* consistently apply grpc_test_slowdown_factor()
* cache time in test
* add blackout_period tests
* avoid some unnecessary copies
* clang-format
* add field to config test
* simplify orca watcher tracking
* attempt to fix build
* iwyu
* generate_projects
* update xds proto dependency
* add xDS LB policy entry to registry
* add "_experimental" suffix to policy name
* update LB policy name and remove debug log
* add env var protection
* generate_projects
* gen_upb_api
* WRR: update tests to cover qps plumbing
* WIP
* Automated change: Fix sanity tests
* more WIP
* basic WRR e2e test working
* add OOB test
* add xDS WRR e2e test
* clang-format
* fix sanity
* ignore duplicate addresses
* Automated change: Fix sanity tests
* add new tracer to doc/environment_variables.md
* retain scheduler state across pickers
* Automated change: Fix sanity tests
* use separate mutexes for scheduler and timer
* sort addresses to avoid index churn
* remove fetch_sub for wrap around in RR case
Co-authored-by: markdroth <markdroth@users.noreply.github.com>