Chiebot-Mirror/grpc - grpc - Gitea: Git with a cup of tea

Commit Graph

Author	SHA1	Message	Date
Craig Tiller	2e2f5c9ba6	[fuzzer] Fix another deadline exceeded case (#34015 )	1 year ago
Craig Tiller	8e18f1c1df	[fuzzer] Fix another deadline exceeded case (#34014 )	1 year ago
Craig Tiller	ad9e8f45eb	[end2end] Fix fuzzer found crash (#34004 ) Fixes b/293789128	1 year ago
Craig Tiller	862e6d0346	[fuzzing] Increase deadline, fix b/292258333 (#33877 )	1 year ago
AJ Heller	0e9553cf4e	[EventEngine] Add TODOs to re-enable EventEngine end2end tests (#33911 ) These tests should be re-enabled before we claim confidence in the engine implementations. It seems these tests are still being run, not sure if that's true in all cases ([example](https://source.cloud.google.com/results/invocations/be524340-c98f-4915-a833-192047ae9925/targets/%2F%2Ftest%2Fcore%2Fend2end:call_creds_test@experiment%3Devent_engine_client/log)). Alternatively, we can scrap this PR and enable all tests now if you feel you're ready to start looking at PosixEventEngine test failures. CC @yijiem @ctiller	1 year ago
AJ Heller	0897f0faf3	[EventEngine][Windows] Temporary changes for rare-flake debugging (#33894 ) CNR a WindowsEventEngine listener flake in: * 10k local Windows development machine runs * 50k Windows RBE runs * 10k Windows VM runs It fails ~5 times per day on the master CI jobs. This PR adds some logging to try to see if an edge is missed, and switches the thread pool implementation to see if that makes the flake go away. If the flakes disappear, I'll try removing one or the other to see if either independently fix the problem (hopefully not logging). --------- Co-authored-by: drfloob <drfloob@users.noreply.github.com>	1 year ago
Craig Tiller	a008026890	[fuzzing] Increase deadline, fix b/293425905 (#33897 )	1 year ago
Craig Tiller	3717ff04ba	[chttp2] Split ping policy from transport (#33703 ) Why: Cleanup for chttp2_transport ahead of promise conversion - lots of logic has become interleaved throughout chttp2, so some effort to isolate logic out is warranted ahead of that conversion. What: Split configuration and policy tracking for each of ping rate throttling and abuse detection into their own modules. Add tests for them. Incidentally: Split channel args into their own header so that we can split the policy stuff into separate build targets. --------- Co-authored-by: ctiller <ctiller@users.noreply.github.com>	1 year ago
Craig Tiller	417c8e3499	[fuzzing] Increase deadline, tweak timeouts for b/291372661 (#33766 )	1 year ago
Craig Tiller	5b46c8bdba	[fuzzing] Increase deadline, fix b/291630910 (#33768 )	1 year ago
Craig Tiller	112a29c6af	[fuzzing] Increase deadline (#33765 ) Fix b/290782226	1 year ago
Craig Tiller	7223a9e5fe	[fuzzing] Increase deadline (#33663 ) Fix b/290886936	1 year ago
Craig Tiller	86d7c8125e	[fuzzing] Increase deadline (#33658 ) Resolves b/290812157	1 year ago
Craig Tiller	dc5c99c9b4	[fuzzing] Increase deadline (#33600 ) Similar pattern to many others.. increase this deadline to have the fuzzer pass. --------- Co-authored-by: ctiller <ctiller@users.noreply.github.com>	1 year ago
Craig Tiller	d3d4d5309d	[end2end] Fix fuzzer found deadline bug (#33633 ) fix b/290140776	1 year ago
Craig Tiller	cdfbb0ced7	[end2end] Fix fuzzer found deadline bug (#33629 ) Fixes b/288888511	1 year ago
Craig Tiller	e28729fe0a	[end2end] Fix fuzzer found deadline bug (#33630 ) fix b/288965746	1 year ago
Craig Tiller	f417da77a6	[end2end] Fix fuzzer found deadline bug (#33631 ) fix b/288718007	1 year ago
Craig Tiller	4b7a360041	[end2end] Fix fuzzer found deadline bug (#33632 ) fix b/289593034	1 year ago
Craig Tiller	08f1cc3ba8	[end2end] Explain failures a little better (#33621 ) I'd been adding the following stanza regularly to debug flakes/fuzz failures: ``` Expect(1, CoreEnd2endTest::MaybePerformAction{[&](bool success) { Crash(absl::StrCat( "Unexpected completion of client side call: success=", success ? "true" : "false", " status=", server_status.ToString(), " initial_md=", server_initial_metadata.ToString())); }}); ``` it was helpful because it indicated why a call batch finished successfully and helped quickly identify next steps. It occurred to me however that this would better be done inside of the framework, and for all ops that have outputs, so this PR does just that. Any time a batch with an op that outputs information finishes successfully but unexpectedly we now display those outputs in human readable form in the error message. Sample output: ``` [ RUN ] CorpusExamples/FuzzerCorpusTest.RunOneExample/0 RUN TEST: Http2SingleHopTest.SimpleDelayedRequestShort/Chttp2SimpleSslFullstack E0101 00:00:05.000000000 396633 simple_delayed_request.cc:37] Create client side call E0101 00:00:05.000000000 396633 simple_delayed_request.cc:41] Start initial batch E0101 00:00:05.000000000 396633 simple_delayed_request.cc:47] Start server E0101 00:00:05.000000000 396633 cq_verifier.cc:364] Verify tag(101)-✅ for 600000ms test/core/end2end/cq_verifier.cc:316: Unexpected event: OP_COMPLETE: tag:0x1 OK with: incoming_metadata: {} status_on_client: status=4 msg=Deadline Exceeded trailing_metadata={} checked @ test/core/end2end/tests/simple_delayed_request.cc:51 expected: test/core/end2end/tests/simple_delayed_request.cc:50: tag(101) success=true ```	1 year ago
Yash Tibrewal	c0889a4f23	[fuzz] Increase call timeout for retry_unref_before_recv (#33608 ) Noticed this failing on an internal cl due to deadline exceeded errors. <!-- If you know who should review your pull request, please assign it to that person, otherwise the pull request would get assigned randomly. If your pull request is for a specific language, please add the appropriate lang label. -->	1 year ago
Mark D. Roth	15db5cd16a	[resolvers] use proper %-encoding of authority by default (#33571 ) - Change the `ResolverFactory::GetDefaultAuthority()` method to %-encode the authority by default, so individual resolver impls don't need to remember to do this. - Remove the hack in the xds resolver for setting the authority to everything after the last `/` character. - Change the `unix`, `unix-abstract`, and `vsock` resolvers to use a real authority instead of hard-coding to "localhost".	1 year ago
Craig Tiller	b28c4048f9	[fuzzing] Fix failures found by max_connection_idle_fuzzer (#33487 ) In chttp2: a pending but not yet sent goaway should block incoming requests just like a sent one (we will sent that data momentarily!) In the test: - handle the case of the connection idle timeout happening before the request arrives at the server - disable retries, as these cause the request to get stuck (as we don't have an additional server to retry on) Fix b/287897932 --------- Co-authored-by: ctiller <ctiller@users.noreply.github.com>	2 years ago
Craig Tiller	223117fc85	[fuzzing] Increase deadline to accommodate fuzzing injected delays (#33480 ) Co-authored-by: ctiller <ctiller@users.noreply.github.com>	2 years ago
Craig Tiller	a8132669f4	[flake] Raise deadline to eliminate flake in request_with_payload (#33488 ) Observed on CI over the weekend (and quite reproducible) --------- Co-authored-by: ctiller <ctiller@users.noreply.github.com>	2 years ago
Craig Tiller	153f4e262c	[fuzzing] Increase deadline to accommodate fuzzing injected delays (#33481 )	2 years ago
Craig Tiller	26ae8a1d96	[fuzzing] Fix fuzzer found test bug (#33471 ) I think this is the right fix (doesn't seem deadline is related to the test logic), but please check.	2 years ago
Craig Tiller	23f5810264	[no_logging] Restrict regexes by module name (#33468 ) Promised follow-up to #33444	2 years ago
Craig Tiller	ecb7549a99	[end2end] Increase deadline on filtered_metadata test after observed failure (#33466 )	2 years ago
Mark D. Roth	d8db05a068	[core e2e tests] increase RPC deadline in a couple of tests to avoid flakes (#33456 )	2 years ago
Craig Tiller	d4be39a6ab	[fuzzing] Use a smaller max delay for writes than run-after (#33455 ) We want writes to participate in event re-ordering, but it's unlikely that we can sustain one byte per 500ms on all tests and keep them passing (which is the degenerate case right now). Tune write delays down to 50ms for the moment, though I expect we'll want to talk about going lower.	2 years ago
Craig Tiller	e0ad9e5746	[end2end] Fix simple delayed request (#33450 ) omgwtfbbq This test relies on WAIT_FOR_READY semantics, but we don't do that in the proxy, so it got assigned the wrong suite. Fix the suite, fix the flakes. Also add some handy dandy logging to help figure this stuff out in the future.	2 years ago
Craig Tiller	cd96210215	[fuzzer] Fix event ordering in retry_max_concurrent_streams (#33454 ) Just like #33405	2 years ago
Craig Tiller	133640507b	[end2end] Better logging, and a fix (by increasing timeout) on disappearing_server_test (#33444 ) I've had local runs with a 10 second gap between creating the call and issuing the first batch client side. --------- Co-authored-by: ctiller <ctiller@users.noreply.github.com>	2 years ago
Craig Tiller	495bcbcb74	[fuzzing] Increase timeout to accomodate slow internal callbacks (#33407 ) Fixes b/286780969	2 years ago
Craig Tiller	eba6cecdcb	[fuzzing] Remove ambiguity in test case, make fuzzer pass (#33405 ) Here the recv message batch 103 was returning end of stream. Per the reasoning in https://github.com/grpc/proposal/blob/master/L104-core-ban-recv-with-send-status.md Sending status is the final thing for a call on the server, so requiring a recv message to complete when we've sent status is getting into at best a gray area in out spec. Add a strict ordering between that recv and the sending of status to make a more deterministic test. fixes b/286708835, b/286727273	2 years ago
Craig Tiller	82534bab24	[end2end] Shard some longer running tests a little more to reduce timeout risk (#33387 ) Also drop a few deadlines so that tests can run faster (where that's safe) <!-- If you know who should review your pull request, please assign it to that person, otherwise the pull request would get assigned randomly. If your pull request is for a specific language, please add the appropriate lang label. -->	2 years ago
Craig Tiller	d8f3dab96c	[end2end] Binary & fuzzer per test .cc (#33374 ) <!-- If you know who should review your pull request, please assign it to that person, otherwise the pull request would get assigned randomly. If your pull request is for a specific language, please add the appropriate lang label. -->	2 years ago
Craig Tiller	bc70a67e94	[fuzzer] Increase timeouts to accommodate delayed callbacks (#33271 ) <!-- If you know who should review your pull request, please assign it to that person, otherwise the pull request would get assigned randomly. If your pull request is for a specific language, please add the appropriate lang label. -->	2 years ago
Craig Tiller	e0ba7b720a	[fuzzing] Increase timeout to accommodate delayed callbacks (#33267 ) Put enough internal delays into this test and it hits deadline exceeded... extend the deadline to cover that. (this is likely to become a common edit over the next few weeks...)	2 years ago
Vignesh Babu	9c59671936	[testing] Skip more flaky event engine tests (#33160 ) Expand the set with more new flaky tests.	2 years ago
Vignesh Babu	63ecc4ba3e	[testing] Temporarily skip flaky event engine tests. (#33136 ) Based on flaky tests reported by dashboard: https://dashboards.corp.google.com/stubby_team.grpc_flaky_dashboard#1318s66d4f	2 years ago
Craig Tiller	4674f2ccf7	[fuzz] Turn core end2end tests into fuzzers (#33013 ) Add a new binary that runs all core end2end tests in fuzzing mode. In this mode FuzzingEventEngine is substituted for the default event engine. This means that time is simulated, as is IO. The FEE gets control of callback delays also. In our tests the `Step()` function becomes, instead of a single call to `completion_queue_next`, a series of calls to that function and `FuzzingEventEngine::Tick`, driving forward the event loop until progress can be made. PR guide: --- New binaries `core_end2end_test_fuzzer` - the new fuzzer itself `seed_end2end_corpus` - a tool that produces an interesting seed corpus Config changes for safe fuzzing The implementation tries to use the config fuzzing work we've previously deployed in api_fuzzer to fuzz across experiments. Since some experiments are far too experimental to be safe in such fuzzing (and this will always be the case): - a new flag is added to experiments to opt-out of this fuzzing - a new hook is added to the config system to allow variables to re-write their inputs before setting them during the fuzz Event manager/IO changes Changes are made to the event engine shims so that tcp_server_posix can run with a non-FD carrying EventEngine. These are in my mind a bit clunky, but they work and they're in code that we expect to delete in the medium term, so I think overall the approach is good. Changes to time A small tweak is made to fix a bug initializing time for fuzzers in time.cc - we were previously failing to initialize `g_process_epoch_cycles` Changes to `Crash` A version that prints to stdio is added so that we can reliably print a crash from the fuzzer. Changes to CqVerifier Hooks are added to allow the top level loop to hook the verification functions with a function that steps time between CQ polls. Changes to end2end fixtures State machinery moves from the fixture to the test infra, to keep the customizations for fuzzing or not in one place. This means that fixtures are now just client/server factories, which is overall nice. It did necessitate moving some bespoke machinery into h2_ssl_cert_test.cc - this file is beginning to be problematic in borrowing parts but not all of the e2e test machinery. Some future PR needs to solve this. A cq arg is added to the Make functions since the cq is now owned by the test and not the fixture. Changes to test registration `TEST_P` is replaced by `CORE_END2END_TEST` and our own test registry is used as a first depot for test information. The gtest version of these tests: queries that registry to manually register tests with gtest. This ultimately changes the name of our tests again (I think for the last time) - the new names are shorter and more readable, so I don't count this as a regression. The fuzzer version of these tests: constructs a database of fuzzable tests that it can consult to look up a particular suite/test/config combination specified by the fuzzer to fuzz against. This gives us a single fuzzer that can test all 3k-ish fuzzing ready tests and cross polinate configuration between them. Changes to test config The zero size registry stuff was causing some problems with the event engine feature macros, so instead I've removed those and used GTEST_SKIP in the problematic tests. I think that's the approach we move towards in the future. Which tests are included Configs that are compatible - those that do not do fd manipulation directly (these are incompatible with FuzzingEventEngine), and those that do not join threads on their shutdown path (as these are incompatible with our cq wait methodology). Each we can talk about in the future - fd manipulation would be a significant expansion of FuzzingEventEngine, and is probably not worth it, however many uses of background threads now should probably evolve to be EventEngine::Run calls in the future, and then would be trivially enabled in the fuzzers. Some tests currently fail in the fuzzing environment, a `SKIP_IF_FUZZING` macro is used for these few to disable them if in the fuzzing environment. We'll burn these down in the future. Changes to fuzzing_event_engine Changes are made to time: an exponential sweep forward is used now - this catches small time precision things early, but makes decade long timers (we have them) able to be used right now. In the future we'll just skip time forward to the next scheduled timer, but that approach doesn't yet work due to legacy timer system interactions. Changes to port assignment: we ensure that ports are legal numbers before assigning them via `grpc_pick_port_or_die`. A race condition between time checking and io is fixed. --------- Co-authored-by: ctiller <ctiller@users.noreply.github.com>	2 years ago
Craig Tiller	63c094cf5b	[promises] Run C++ end to end tests with server promises (#32537 ) Expand server promises to run with C++ end2end tests. Across connected_channel/call/batch_builder/pipe/transport: - fix a bug where read errors weren't propagated from transport to call so that we can populate failed_before_recv_message for the c++ bindings - ensure those errors are not, however, used to populate the returned call status Add a new latch call arg to lazily propagate the bound CQ for a server call (and client call, but here it's used degenerately - it's always populated). This allows server calls to be properly bound to pollsets.(1)/(2) In call.cc: - move some profiling code from FilterStackCall to Call, and then use it in PromiseBasedCall (this should be cleaned up with tracing work) - implement GetServerAuthority In server.cc: - use an RAII pattern on `MatchResult` to avoid a bug whereby a tag could be dropped if we cancel a request after it's been matched but before it's published - fix deadline export to ServerContext In resource_quota_server.cc: - fix some long standing flakes (that were finally obvious with the new test code) - it's legal here to have client calls not arrive at the server due to resource starvation, work through that (includes adding expectations during a `Step` call, which required some small tweaks to cq_verifier) In the C++ end2end_test.cc: - strengthen a flaky test so it passes consistently (it's likely we'll revisit this with the fuzzing efforts to strengthen it into an actually robust test) (1) It's time to remove this concept (2) Surprisingly the only test that reliably demonstrates this not being done is time_change_test --------- Co-authored-by: ctiller <ctiller@users.noreply.github.com>	2 years ago
Craig Tiller	afddf1a70c	[chttp2] Better error message on metadata size exceeded message (#32809 ) This error can trigger for either initial or trailing metadata (and we've had outages where the latter was the cause). I don't think we know at this layer if we're parsing initial or trailing - though it'd be a good exercise to plumb that through. For now remove the word initial because it's better to give less information than wrong information. <!-- If you know who should review your pull request, please assign it to that person, otherwise the pull request would get assigned randomly. If your pull request is for a specific language, please add the appropriate lang label. -->	2 years ago
Craig Tiller	724441d85b	[tests] Convert core e2e tests to gtest (#32603 ) Notes: - `+trace` fixtures haven't run since 2016, so they're disabled for now (`7ad2d0b463 (diff-780fce7267c34170c1d0ea15cc9f65a7f4b79fefe955d185c44e8b3251cf9e38R76)`) - all current fixtures define `FEATURE_MASK_SUPPORTS_AUTHORITY_HEADER` and hence `authority_not_supported` has not been run in years - deleted - bad_hostname similarly hasn't been triggered in a long while, so deleted - load_reporting_hook has never been enabled, so deleted (`f23fb4cf31/test/core/end2end/generate_tests.bzl (L145-L148)`) - filter_latency & filter_status_code rely on global variables and so don't convert particularly cleanly - and their value seems marginal, so deleted --------- Co-authored-by: ctiller <ctiller@users.noreply.github.com>	2 years ago
Craig Tiller	175ccc3a90	Reland global config changes (#32661 ) <!-- If you know who should review your pull request, please assign it to that person, otherwise the pull request would get assigned randomly. If your pull request is for a specific language, please add the appropriate lang label. --> --------- Co-authored-by: ctiller <ctiller@users.noreply.github.com>	2 years ago
Alisha Nanda	19d06a78ec	Add random early rejection for metadata (#32600 ) (hopefully last try) Add new channel arg GRPC_ARG_ABSOLUTE_MAX_METADATA_SIZE as hard limit for metadata. Change GRPC_ARG_MAX_METADATA_SIZE to be a soft limit. Behavior is as follows: Hard limit (1) if hard limit is explicitly set, this will be used. (2) if hard limit is not explicitly set, maximum of default and soft limit * 1.25 (if soft limit is set) will be used. Soft limit (1) if soft limit is explicitly set, this will be used. (2) if soft limit is not explicitly set, maximum of default and hard limit * 0.8 (if hard limit is set) will be used. Requests between soft and hard limit will be rejected randomly, requests above hard limit will be rejected.	2 years ago
Craig Tiller	8d2f70d53c	Reland "[promises] Convert call to a party" (#32651 )" (#32653 ) <!-- If you know who should review your pull request, please assign it to that person, otherwise the pull request would get assigned randomly. If your pull request is for a specific language, please add the appropriate lang label. --> --------- Co-authored-by: ctiller <ctiller@users.noreply.github.com>	2 years ago
Jan Tattermusch	0c1797cd9f	Revert "[config] Move global config alongside core configuration" (#32659 ) Reverts grpc/grpc#30788 (it breaks grpc_objc_bazel_test (see https://github.com/grpc/grpc/pull/30788#issuecomment-1476372187) and also seems to be breaking some other internal stuff).	2 years ago

1 2 3 4 5 ...

1205 Commits (66f60aa763ce3f9f5af0e9082753ead81103bcaa)