This is the initial change of chaotic-good client transport read path,
which is a following PR of the client transport write path at #33876.
There's a pending work of handling endpoint failures in the transport.
It will be added after we have the inter-activity pipe with close
function.
<!--
If you know who should review your pull request, please assign it to
that
person, otherwise the pull request would get assigned randomly.
If your pull request is for a specific language, please add the
appropriate
lang label.
-->
- Fixes support for the same address being present more than once in the
address list, which was accidentally broken in #34244.
- Change the call attribute to encode the hash as an integer instead of
a string.
(I had to first solve a problem with "dubious file ownership" error that
was happening inside the grpc_repo_archive action when using the
concurrent wrapper).
Example:
```
tools/docker_runners/examples/concurrent_bazel.sh --bazelrc=tools/remote_build/linux.bazelrc test --genrule_strategy=remote,local --workspace_status_command=tools/bazelify_tests/workspace_status_cmd.sh //tools/bazelify_tests/test:runtests_csharp_linux_dbg
```
It was possible for threads that call `WinSocket::NotifyOnRead` and
`WinSocket::NotifyOnWrite` to race against IOCP poller threads, causing
poller events to be missed.
In the most common usage, in some thread (E), the Endpoint would make an
async (overlapped) read or write using `WSARecv` or `WSASend`
respectively, then use the socket's `NotifyOn*` methods to have
callbacks executed when data was ready. If data was already available,
those callbacks would be scheduled for execution immediately. Meanwhile,
if overlapped events came in for some socket, some IOCP poller thread
(P) would inform the socket that data was ready, and if notification
callbacks were already present, they would be scheduled for execution
immediately. It was possible for thread (E) to see no data available,
and thread (P) to not see any notification callbacks registered. This
resulted in registered callbacks that would never be called, for data
that had already been received.
If we send a large amount of data along with an ack, then we bundle that
ack in with a crypto frame that might be very large. This causes two
problems:
1. we need to unencrypt a large frame (up to 1MB with ALTS) before we
can ack the ping, and on a slow connection this could take some time.
2. we need to do all the crypto work up front to send that ping ack,
also creating lots of work.
I think there's an equivalent fix needed on the ping send side, but less
urgent because it's not currently causing flakiness!
Bumps [gevent](https://github.com/gevent/gevent) from 22.08.0 to 23.9.1.
<details>
<summary>Commits</summary>
<ul>
<li><a
href="3977b6b3ec"><code>3977b6b</code></a>
Preparing release 23.9.1</li>
<li><a
href="068645423b"><code>0686454</code></a>
Typo.</li>
<li><a
href="bdc82c9e47"><code>bdc82c9</code></a>
Bump to greenlet 3.0rc3</li>
<li><a
href="ca0f9cb7b4"><code>ca0f9cb</code></a>
Bump to greenlet 3.0rc2 and require it on Py3.11 as well as 3.12. Also,
since...</li>
<li><a
href="0047199e04"><code>0047199</code></a>
Back to development: 23.9.0.post2</li>
<li><a
href="06879924ed"><code>0687992</code></a>
Preparing release 23.9.0.post1</li>
<li><a
href="166ecf1bc0"><code>166ecf1</code></a>
Fix windows wheel builds; ensure mac wheel builds have the universal2
tag</li>
<li><a
href="9b72b8c54e"><code>9b72b8c</code></a>
Back to development: 23.9.1</li>
<li><a
href="693181e8e1"><code>693181e</code></a>
Preparing release 23.9.0</li>
<li><a
href="6fc78989b6"><code>6fc7898</code></a>
Set the cython version; go back to default wheel tags.</li>
<li>Additional commits viewable in <a
href="https://github.com/gevent/gevent/compare/22.08.0...23.9.1">compare
view</a></li>
</ul>
</details>
<br />
[![Dependabot compatibility
score](https://dependabot-badges.githubapp.com/badges/compatibility_score?dependency-name=gevent&package-manager=pip&previous-version=22.08.0&new-version=23.9.1)](https://docs.github.com/en/github/managing-security-vulnerabilities/about-dependabot-security-updates#about-compatibility-scores)
Dependabot will resolve any conflicts with this PR as long as you don't
alter it yourself. You can also trigger a rebase manually by commenting
`@dependabot rebase`.
[//]: # (dependabot-automerge-start)
[//]: # (dependabot-automerge-end)
---
<details>
<summary>Dependabot commands and options</summary>
<br />
You can trigger Dependabot actions by commenting on this PR:
- `@dependabot rebase` will rebase this PR
- `@dependabot recreate` will recreate this PR, overwriting any edits
that have been made to it
- `@dependabot merge` will merge this PR after your CI passes on it
- `@dependabot squash and merge` will squash and merge this PR after
your CI passes on it
- `@dependabot cancel merge` will cancel a previously requested merge
and block automerging
- `@dependabot reopen` will reopen this PR if it is closed
- `@dependabot close` will close this PR and stop Dependabot recreating
it. You can achieve the same result by closing it manually
- `@dependabot show <dependency name> ignore conditions` will show all
of the ignore conditions of the specified dependency
- `@dependabot ignore this major version` will close this PR and stop
Dependabot creating any more for this major version (unless you reopen
the PR or upgrade to it yourself)
- `@dependabot ignore this minor version` will close this PR and stop
Dependabot creating any more for this minor version (unless you reopen
the PR or upgrade to it yourself)
- `@dependabot ignore this dependency` will close this PR and stop
Dependabot creating any more for this dependency (unless you reopen the
PR or upgrade to it yourself)
You can disable automated security fix PRs for this repo from the
[Security Alerts page](https://github.com/grpc/grpc/network/alerts).
</details>
Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
This behavior is dangerous because we will crash when the cache is
created, which is not necessarily on application startup and is likely
when you first try to establish an SSL connection. Instead, we log an
error. If the SSL library attempts to put a session ticket in the cache
it will fail to do so, but everything else will continue as normal. In
particular, we will always seamlessly fall back to a full SSL handshake.
Along the way, we also ensure that you cannot put a null `SSL_SESSION`
into the cache, which would lead to a segfault when it is fetched from
the cache.
The previous hack had bitrotted and was just returning 3.7. This method
was given to us by the GCF team and has backward compatibility
guarantees.
This will also help us to ensure that we don't accidentally remove
support for a particular Python runtime version before GCF does.
We shouldn't just set `termination_grace_period_seconds=600` by default
for all gamma tests extending `GammaXdsKubernetesTestCase`.
This is what's causing the deployment deletion issue:
> `framework.helpers.retryers.RetryError: Retry error calling
framework.xds_k8s_testcase.IsolatedXdsKubernetesTestCase.cleanup: 1
attempts exhausted. Last exception: RetryError: Retry error calling
framework.infrastructure.k8s.KubernetesNamespace.get_deployment: timeout
0:05:00 (h:mm:ss) exceeded. Check result callback returned False.`
We wait for 5 minutes, while the deployment is happily handing for 10.
Then the second cleanup retry kills it - but not before waiting for
another 5 minutes.
I think `self.force = False` may be solving another issue triggered by
the get_deployment retry timeout: because we start over deleting the
resources by name and some of them are deleted from the first attempt we
get 404. And I'm pretty sure we don't do error-handling correctly when
deleting CRD-based resources - which cascades into even more unnecessary
retries.
- GAMMA server runner: increase the wait time for the NEG annotation
from 1 minute to 3.
- Improve the wording around the wait for NEG methods to make it clear
this is the annotation we're waiting for - so it's not confused with
getting the NEG health from the GCP APIs.
ref b/298501683, b/302723651
#34274 is still seeing some headwinds against landing it (I continue to
believe it's the correct fix however).
I'm putting this out as a potential short term mitigation for the load
effects we were seeing, so we can try with folks experiencing problems
and see if it unblocks them.
---------
Co-authored-by: ctiller <ctiller@users.noreply.github.com>
We originally wanted a common format with internal OWNERS files on the
thought that we'd use them and import them and oh gosh that was the
wrong thing to think.
With upcoming changes to tree management having them here is going to
cause problems, so lets just update the CODEOWNERS files directly.
<!--
If you know who should review your pull request, please assign it to
that
person, otherwise the pull request would get assigned randomly.
If your pull request is for a specific language, please add the
appropriate
lang label.
-->
Foundation for being able to bazelify the build artifact -> build
package -> distribtest workflow tests.
Main ideas:
- "build artifact" and "build packages" will be represented by a custom
genrule (that runs the build on RBE under a docker container).
- since genrule doesn't support displaying logs for each target as a
separate "target log" (in the same way that bazel tests do), and we
generally want readable per-target logs for the bazelified test, a pair
of targets will be created for each "build artifact task":
- a genrule that actually performs the build, creates an archive with
artifacts and stores the exitcode and build log as rule outputs
- a corresponding "build_test" sh_test that simply looks at the result
of the genrule and presents the build log and build result a "target
log" for this test.
Previously it turns out it was not safe to run grpc_init in a filter
test - we'd end up mixing event engine implementations, and causing
undefined behavior at grpc_shutdown.
This change makes it safe and fixes a test internally that's flaking at
70% right now (b/302986486).
---------
Co-authored-by: ctiller <ctiller@users.noreply.github.com>
Fix chttp2 too_many_pings test to use only one of IPv4 or IPv6,
depending on test environment.
Also fix dumb reversed conditional bug in some other tests that was
accidentally introduced in #34426.
Looks like we've got a thread race on shutdown with some of these
tests... adding a barrier at the head of tests that require precise
transport counts in order to stabilize.
To avoid depending on transitive includes, specially avoid relying on
transitive include status.h -> str_cat.h that is removed in the latest
version of abseil