This doesn't matter for gtest (it does its own sorting later), but for
the fuzzers this will probably save a bit of churn in the corpus and
lead to faster discovery of interesting cases.
CNR a WindowsEventEngine listener flake in:
* 10k local Windows development machine runs
* 50k Windows RBE runs
* 10k Windows VM runs
It fails ~5 times per day on the master CI jobs.
This PR adds some logging to try to see if an edge is missed, and
switches the thread pool implementation to see if that makes the flake
go away. If the flakes disappear, I'll try removing one or the other to
see if either independently fix the problem (hopefully not logging).
---------
Co-authored-by: drfloob <drfloob@users.noreply.github.com>
Why: Cleanup for chttp2_transport ahead of promise conversion - lots of
logic has become interleaved throughout chttp2, so some effort to
isolate logic out is warranted ahead of that conversion.
What: Split configuration and policy tracking for each of ping rate
throttling and abuse detection into their own modules. Add tests for
them.
Incidentally: Split channel args into their own header so that we can
split the policy stuff into separate build targets.
---------
Co-authored-by: ctiller <ctiller@users.noreply.github.com>
I'd been adding the following stanza regularly to debug flakes/fuzz
failures:
```
Expect(1, CoreEnd2endTest::MaybePerformAction{[&](bool success) {
Crash(absl::StrCat(
"Unexpected completion of client side call: success=",
success ? "true" : "false", " status=", server_status.ToString(),
" initial_md=", server_initial_metadata.ToString()));
}});
```
it was helpful because it indicated why a call batch finished
successfully and helped quickly identify next steps.
It occurred to me however that this would better be done inside of the
framework, and for *all* ops that have outputs, so this PR does just
that. Any time a batch with an op that outputs information finishes
successfully but unexpectedly we now display those outputs in human
readable form in the error message.
Sample output:
```
[ RUN ] CorpusExamples/FuzzerCorpusTest.RunOneExample/0
RUN TEST: Http2SingleHopTest.SimpleDelayedRequestShort/Chttp2SimpleSslFullstack
E0101 00:00:05.000000000 396633 simple_delayed_request.cc:37] Create client side call
E0101 00:00:05.000000000 396633 simple_delayed_request.cc:41] Start initial batch
E0101 00:00:05.000000000 396633 simple_delayed_request.cc:47] Start server
E0101 00:00:05.000000000 396633 cq_verifier.cc:364] Verify tag(101)-✅ for 600000ms
test/core/end2end/cq_verifier.cc:316: Unexpected event: OP_COMPLETE: tag:0x1 OK
with:
incoming_metadata: {}
status_on_client: status=4 msg=Deadline Exceeded trailing_metadata={}
checked @ test/core/end2end/tests/simple_delayed_request.cc:51
expected:
test/core/end2end/tests/simple_delayed_request.cc:50: tag(101) success=true
```
Noticed this failing on an internal cl due to deadline exceeded errors.
<!--
If you know who should review your pull request, please assign it to
that
person, otherwise the pull request would get assigned randomly.
If your pull request is for a specific language, please add the
appropriate
lang label.
-->
- Change the `ResolverFactory::GetDefaultAuthority()` method to %-encode
the authority by default, so individual resolver impls don't need to
remember to do this.
- Remove the hack in the xds resolver for setting the authority to
everything after the last `/` character.
- Change the `unix`, `unix-abstract`, and `vsock` resolvers to use a
real authority instead of hard-coding to "localhost".
In chttp2: a pending but not yet sent goaway should block incoming
requests just like a sent one (we will sent that data momentarily!)
In the test:
- handle the case of the connection idle timeout happening before the
request arrives at the server
- disable retries, as these cause the request to get stuck (as we don't
have an additional server to retry on)
Fix b/287897932
---------
Co-authored-by: ctiller <ctiller@users.noreply.github.com>
We want writes to participate in event re-ordering, but it's unlikely
that we can sustain one byte per 500ms on all tests and keep them
passing (which is the degenerate case right now).
Tune write delays down to 50ms for the moment, though I expect we'll
want to talk about going lower.
omgwtfbbq
This test relies on WAIT_FOR_READY semantics, but we don't do that in
the proxy, so it got assigned the wrong suite.
Fix the suite, fix the flakes.
Also add some handy dandy logging to help figure this stuff out in the
future.
I can still make the old algorithm break and assign duplicate names on
my machine... make it a little more robust.
---------
Co-authored-by: ctiller <ctiller@users.noreply.github.com>
I've had local runs with a 10 second gap between creating the call and
issuing the first batch client side.
---------
Co-authored-by: ctiller <ctiller@users.noreply.github.com>
Fix fuzzer found bug b/286716972
Follows up on https://github.com/grpc/grpc/pull/33266 but gets the edge
case right of when there's a read queued before the peer closes - in
that case we weren't waking up the read.
<!--
If you know who should review your pull request, please assign it to
that
person, otherwise the pull request would get assigned randomly.
If your pull request is for a specific language, please add the
appropriate
lang label.
-->
We should probably cap this so that our customers have a chance of
cloning the repository.
---------
Co-authored-by: ctiller <ctiller@users.noreply.github.com>
Here the recv message batch 103 was returning end of stream.
Per the reasoning in
https://github.com/grpc/proposal/blob/master/L104-core-ban-recv-with-send-status.md
Sending status is the final thing for a call on the server, so requiring
a recv message to complete when we've sent status is getting into at
best a gray area in out spec.
Add a strict ordering between that recv and the sending of status to
make a more deterministic test.
fixes b/286708835, b/286727273
- Switched from yapf to black
- Reconfigure isort for black
- Resolve black/pylint idiosyncrasies
Note: I used `--experimental-string-processing` because black was
producing "implicit string concatenation", similar to what described
here: https://github.com/psf/black/issues/1837. While currently this
feature is experimental, it will be enabled by default:
https://github.com/psf/black/issues/2188. After running black with the
new string processing so that the generated code merges these `"hello" "
world"` strings concatenations, then I removed
`--experimental-string-processing` for stability, and regenerated the
code again.
To the reviewer: don't even try to open "Files Changed" tab 😄 It's
better to review commit-by-commit, and ignore `run black and isort`.
Also drop a few deadlines so that tests can run faster (where that's
safe)
<!--
If you know who should review your pull request, please assign it to
that
person, otherwise the pull request would get assigned randomly.
If your pull request is for a specific language, please add the
appropriate
lang label.
-->
<!--
If you know who should review your pull request, please assign it to
that
person, otherwise the pull request would get assigned randomly.
If your pull request is for a specific language, please add the
appropriate
lang label.
-->
This had been intended to be 500ms for the first round, but
inadvertently got bumped up during some last minute investigations.
Tune it back down, let things settle out, and then see whether we want
to increase it or not.
<!--
If you know who should review your pull request, please assign it to
that
person, otherwise the pull request would get assigned randomly.
If your pull request is for a specific language, please add the
appropriate
lang label.
-->
When I converted these tests there was an intent to use gtest failures
to see past an unexpected event/no event received error - however that
doesn't work out because our tests rely on the prior events happening.
What we got instead was misattributed failures, folks chasing wild
theories on uninitialized data, dogs and cats living together, mass
hysteria.
Let's just crash and see if it makes diagnosis actually easier.
<!--
If you know who should review your pull request, please assign it to
that
person, otherwise the pull request would get assigned randomly.
If your pull request is for a specific language, please add the
appropriate
lang label.
-->