grpc][Gpr_To_Absl_Logging] Migrating from gpr to absl logging GPR_ASSERT
Replacing GPR_ASSERT with absl CHECK
These changes have been made using string replacement
Will not be replacing all instances of CHECK with CHECK_EQ , CHECK_NE etc because there are too many callsites. Only ones which are doable using very simple regex with least chance of failure will be replaced.
Given that we have 5000+ instances of GPR_ASSERT to edit, Doing it manually is too much work for both the author and reviewer.
<!--
If you know who should review your pull request, please assign it to that
person, otherwise the pull request would get assigned randomly.
If your pull request is for a specific language, please add the appropriate
lang label.
-->
Closes#36399
COPYBARA_INTEGRATE_REVIEW=https://github.com/grpc/grpc/pull/36399 from tanvi-jagtap:tjagtap_core_end2end aeda9ecdea
PiperOrigin-RevId: 627626105
<!--
If you know who should review your pull request, please assign it to
that
person, otherwise the pull request would get assigned randomly.
If your pull request is for a specific language, please add the
appropriate
lang label.
-->
I'd been adding the following stanza regularly to debug flakes/fuzz
failures:
```
Expect(1, CoreEnd2endTest::MaybePerformAction{[&](bool success) {
Crash(absl::StrCat(
"Unexpected completion of client side call: success=",
success ? "true" : "false", " status=", server_status.ToString(),
" initial_md=", server_initial_metadata.ToString()));
}});
```
it was helpful because it indicated why a call batch finished
successfully and helped quickly identify next steps.
It occurred to me however that this would better be done inside of the
framework, and for *all* ops that have outputs, so this PR does just
that. Any time a batch with an op that outputs information finishes
successfully but unexpectedly we now display those outputs in human
readable form in the error message.
Sample output:
```
[ RUN ] CorpusExamples/FuzzerCorpusTest.RunOneExample/0
RUN TEST: Http2SingleHopTest.SimpleDelayedRequestShort/Chttp2SimpleSslFullstack
E0101 00:00:05.000000000 396633 simple_delayed_request.cc:37] Create client side call
E0101 00:00:05.000000000 396633 simple_delayed_request.cc:41] Start initial batch
E0101 00:00:05.000000000 396633 simple_delayed_request.cc:47] Start server
E0101 00:00:05.000000000 396633 cq_verifier.cc:364] Verify tag(101)-✅ for 600000ms
test/core/end2end/cq_verifier.cc:316: Unexpected event: OP_COMPLETE: tag:0x1 OK
with:
incoming_metadata: {}
status_on_client: status=4 msg=Deadline Exceeded trailing_metadata={}
checked @ test/core/end2end/tests/simple_delayed_request.cc:51
expected:
test/core/end2end/tests/simple_delayed_request.cc:50: tag(101) success=true
```
I've had local runs with a 10 second gap between creating the call and
issuing the first batch client side.
---------
Co-authored-by: ctiller <ctiller@users.noreply.github.com>
When I converted these tests there was an intent to use gtest failures
to see past an unexpected event/no event received error - however that
doesn't work out because our tests rely on the prior events happening.
What we got instead was misattributed failures, folks chasing wild
theories on uninitialized data, dogs and cats living together, mass
hysteria.
Let's just crash and see if it makes diagnosis actually easier.
<!--
If you know who should review your pull request, please assign it to
that
person, otherwise the pull request would get assigned randomly.
If your pull request is for a specific language, please add the
appropriate
lang label.
-->
Add a new binary that runs all core end2end tests in fuzzing mode.
In this mode FuzzingEventEngine is substituted for the default event
engine. This means that time is simulated, as is IO. The FEE gets
control of callback delays also.
In our tests the `Step()` function becomes, instead of a single call to
`completion_queue_next`, a series of calls to that function and
`FuzzingEventEngine::Tick`, driving forward the event loop until
progress can be made.
PR guide:
---
**New binaries**
`core_end2end_test_fuzzer` - the new fuzzer itself
`seed_end2end_corpus` - a tool that produces an interesting seed corpus
**Config changes for safe fuzzing**
The implementation tries to use the config fuzzing work we've previously
deployed in api_fuzzer to fuzz across experiments. Since some
experiments are far too experimental to be safe in such fuzzing (and
this will always be the case):
- a new flag is added to experiments to opt-out of this fuzzing
- a new hook is added to the config system to allow variables to
re-write their inputs before setting them during the fuzz
**Event manager/IO changes**
Changes are made to the event engine shims so that tcp_server_posix can
run with a non-FD carrying EventEngine. These are in my mind a bit
clunky, but they work and they're in code that we expect to delete in
the medium term, so I think overall the approach is good.
**Changes to time**
A small tweak is made to fix a bug initializing time for fuzzers in
time.cc - we were previously failing to initialize
`g_process_epoch_cycles`
**Changes to `Crash`**
A version that prints to stdio is added so that we can reliably print a
crash from the fuzzer.
**Changes to CqVerifier**
Hooks are added to allow the top level loop to hook the verification
functions with a function that steps time between CQ polls.
**Changes to end2end fixtures**
State machinery moves from the fixture to the test infra, to keep the
customizations for fuzzing or not in one place. This means that fixtures
are now just client/server factories, which is overall nice.
It did necessitate moving some bespoke machinery into
h2_ssl_cert_test.cc - this file is beginning to be problematic in
borrowing parts but not all of the e2e test machinery. Some future PR
needs to solve this.
A cq arg is added to the Make functions since the cq is now owned by the
test and not the fixture.
**Changes to test registration**
`TEST_P` is replaced by `CORE_END2END_TEST` and our own test registry is
used as a first depot for test information.
The gtest version of these tests: queries that registry to manually
register tests with gtest. This ultimately changes the name of our tests
again (I think for the last time) - the new names are shorter and more
readable, so I don't count this as a regression.
The fuzzer version of these tests: constructs a database of fuzzable
tests that it can consult to look up a particular suite/test/config
combination specified by the fuzzer to fuzz against. This gives us a
single fuzzer that can test all 3k-ish fuzzing ready tests and cross
polinate configuration between them.
**Changes to test config**
The zero size registry stuff was causing some problems with the event
engine feature macros, so instead I've removed those and used GTEST_SKIP
in the problematic tests. I think that's the approach we move towards in
the future.
**Which tests are included**
Configs that are compatible - those that do not do fd manipulation
directly (these are incompatible with FuzzingEventEngine), and those
that do not join threads on their shutdown path (as these are
incompatible with our cq wait methodology). Each we can talk about in
the future - fd manipulation would be a significant expansion of
FuzzingEventEngine, and is probably not worth it, however many uses of
background threads now should probably evolve to be EventEngine::Run
calls in the future, and then would be trivially enabled in the fuzzers.
Some tests currently fail in the fuzzing environment, a
`SKIP_IF_FUZZING` macro is used for these few to disable them if in the
fuzzing environment. We'll burn these down in the future.
**Changes to fuzzing_event_engine**
Changes are made to time: an exponential sweep forward is used now -
this catches small time precision things early, but makes decade long
timers (we have them) able to be used right now. In the future we'll
just skip time forward to the next scheduled timer, but that approach
doesn't yet work due to legacy timer system interactions.
Changes to port assignment: we ensure that ports are legal numbers
before assigning them via `grpc_pick_port_or_die`.
A race condition between time checking and io is fixed.
---------
Co-authored-by: ctiller <ctiller@users.noreply.github.com>
Expand server promises to run with C++ end2end tests.
Across connected_channel/call/batch_builder/pipe/transport:
- fix a bug where read errors weren't propagated from transport to call
so that we can populate failed_before_recv_message for the c++ bindings
- ensure those errors are not, however, used to populate the returned
call status
Add a new latch call arg to lazily propagate the bound CQ for a server
call (and client call, but here it's used degenerately - it's always
populated). This allows server calls to be properly bound to
pollsets.(1)/(2)
In call.cc:
- move some profiling code from FilterStackCall to Call, and then use it
in PromiseBasedCall (this should be cleaned up with tracing work)
- implement GetServerAuthority
In server.cc:
- use an RAII pattern on `MatchResult` to avoid a bug whereby a tag
could be dropped if we cancel a request after it's been matched but
before it's published
- fix deadline export to ServerContext
In resource_quota_server.cc:
- fix some long standing flakes (that were finally obvious with the new
test code) - it's legal here to have client calls not arrive at the
server due to resource starvation, work through that (includes adding
expectations during a `Step` call, which required some small tweaks to
cq_verifier)
In the C++ end2end_test.cc:
- strengthen a flaky test so it passes consistently (it's likely we'll
revisit this with the fuzzing efforts to strengthen it into an actually
robust test)
(1) It's time to remove this concept
(2) Surprisingly the only test that *reliably* demonstrates this not
being done is time_change_test
---------
Co-authored-by: ctiller <ctiller@users.noreply.github.com>
Notes:
- `+trace` fixtures haven't run since 2016, so they're disabled for now
(7ad2d0b463 (diff-780fce7267c34170c1d0ea15cc9f65a7f4b79fefe955d185c44e8b3251cf9e38R76))
- all current fixtures define `FEATURE_MASK_SUPPORTS_AUTHORITY_HEADER`
and hence `authority_not_supported` has not been run in years - deleted
- bad_hostname similarly hasn't been triggered in a long while, so
deleted
- load_reporting_hook has never been enabled, so deleted
(f23fb4cf31/test/core/end2end/generate_tests.bzl (L145-L148))
- filter_latency & filter_status_code rely on global variables and so
don't convert particularly cleanly - and their value seems marginal, so
deleted
---------
Co-authored-by: ctiller <ctiller@users.noreply.github.com>
<!--
If you know who should review your pull request, please assign it to
that
person, otherwise the pull request would get assigned randomly.
If your pull request is for a specific language, please add the
appropriate
lang label.
-->
---------
Co-authored-by: ctiller <ctiller@users.noreply.github.com>
<!--
If you know who should review your pull request, please assign it to
that
person, otherwise the pull request would get assigned randomly.
If your pull request is for a specific language, please add the
appropriate
lang label.
-->
---------
Co-authored-by: ctiller <ctiller@users.noreply.github.com>