As per gRFC A58, when WRR sees a subchannel report READY, it reset the
non_empty_since value, thus restarting the blackout period. However,
there were two cases where we were incorrectly triggering this code:
1. When WRR got an updated address list that contained addresses that
were already present on the old list and whose subchannels were already
in READY state, the initial notification for those subchannels on the
new list was READY, which incorrectly triggered resetting the
non_empty_since value.
2. Due to a bug in the outlier_detection policy, whenever an update was
propagated down through the OD policy without actually enabling OD, it
would incorrectly send a duplicate connectivity state notification for
the subchannels. This meant that a subchannel that was already in state
READY would report READY again, which would also incorrectly trigger
resetting the non_empty_since value.
This PR makes two changes:
1. Fix the bug in outlier_detection that caused it to generate the
spurious duplicate READY updates.
2. Fix WRR to reset the non_empty_since value when a subchannel goes
READY only if the subchannel has seen a previous state update and only
if that previous state was not READY. (The duplicate READY notifications
should not actually happen anymore now that the OD policy has been
fixed, but better to be defensive.)
Fixes b/290983884.
Makes the changes necessary to run the new PSM interop framework on
Ubuntu 22.04:
- Install dependencies via apt: `kubectl`,
`google-cloud-sdk-gke-gcloud-auth-plugin` (previously were
pre-provisioned or available as part of gcloud distribution)
- Use venv instead of pyenv
- Use python 3.10 instead of python3 .9
Other changes:
- Do not update gcloud components - the one provisioned is relatively
recent, and expected to be updated as new base images are released
- Unpin pip from `21.0.1`. Not sure if we're OK about using the latest
one `venv --upgrade-deps`, or if we should just pin it to something more
recent (currently it resolves to `pip 22.0.2` and `setuptools 68.0.0`)
ref b/274944592, cl/547690787
Adds access token lifetime configuration for workload identity
federation with service account impersonation for both explicit and
implicit flows.
Changes:
1. Adds a new member "service_account_impersonation" to the
ExternalAccountCredentials class. "token_lifetime_seconds" is a member
of "service_account_impersonation".
2. Adds validation checks, like token_lifetime_seconds should be between
the minimum and maximum accepted value, during the creation of an
ExternalAccountCredentials object.
3. Appends "lifetime" to the body of the service account impersonation
request.
Tests:
1. Modifies a test to check if the default value is passed when
"service_account_impersonation" is empty.
2. Adds tests to check if the token_lifetime_seconds value is propagated
to the request body.
3. Adds tests to verify that an error is thrown when
token_lifetime_seconds is invalid.
So far, we've structured global C-core init/shutdown as follows:
1) every grpc-ruby object calls `grpc_init` when allocated, and
`grpc_shutdown` when finalized
2) grpc-ruby background threads are each wrapped in a
`grpc_init/shutdown` pair - for example see
b32d94de05/src/ruby/ext/grpc/rb_event_thread.c (L122)
and
b32d94de05/src/ruby/ext/grpc/rb_event_thread.c (L136)
But because ruby doesn't join ruby threads when the process is
terminating, the `init/shutdown` pairs in 2) are always left open. I.e.,
thus far we have never been invoking the final call to `grpc_shutdown`
which actually does C-core global shutdown. Thus our calls to
`grpc_shutdown` are useless.
So we might as well keep things simple and not even attempt to call
`grpc_shutdown`. Now we just call `grpc_init` before using C-core, and
then never even attempt global shutdown (in non-forking situations).
As a bonus, this fixes the issue with the event thread's
`grpc_ruby_init` racing with `prefork` that is mentioned in
https://github.com/grpc/grpc/pull/33666
As part of the dualstack backend designs, subchannels will be created
lazily. Therefore, instead of asserting that there is 1 READY subchannel
and `n - 1` IDLE subchannels, we just assert that there is 1 READY
subchannel.
- Support call finalizers in filter test.
- Add an accessor to the filter implementation from the channel, so that
it can be interrogated by tests.
- Matcher to ensure that some metadata is *not* in a metadata batch
(functionality needed to support the additional testing we talked about
this morning)
---------
Co-authored-by: ctiller <ctiller@users.noreply.github.com>
<!--
If you know who should review your pull request, please assign it to
that
person, otherwise the pull request would get assigned randomly.
If your pull request is for a specific language, please add the
appropriate
lang label.
-->
In preparation for implementing the promise based version, separate out
the legacy call data from the filter.
There are two commits here, each representing one phase of this code
movement:
66676d398c moves `class RetryFilter` into
the header and the vtable name into that class, as this will be shared
code between the implementations
4c84f115ad then moves `class
RetryFilter::CallData` into `class RetryFilterLegacyCallData`, and moves
*that* into its own file
Doing so makes me less confused as to what I'm editing going forward.
No functionality should be affected.
---------
Co-authored-by: ctiller <ctiller@users.noreply.github.com>
In real services most of our time ends up in the `Read1()` function,
which populates one byte into the bit buffer.
Change this to read in as many as possible bytes at a time into that
buffer.
Additionally, generate all possible (to some depth) parser geometries,
and add a benchmark for them. Run that benchmark and select the best
geometry for decoding base64 strings (since this is the main use-case).
(gives about a 30% speed boost parsing base64 then huffman encoded
random binary strings)
---------
Co-authored-by: ctiller <ctiller@users.noreply.github.com>
Try again to get a trailers only signal through to the logging filter.
(this needs a change for non-chttp2 transports... I think we wait until
we have actual users using binary logging on non http2 transports to do
that work)
Detected with gcc 13:
```
In file included from /data/mwrep/res/osp/Grpc/23-0-0-0/include/grpcpp/impl/proto_utils.h:31,
from ./include/generated/gacms.object.grpc.pb.h:18,
from ./include/generated/gacms.object.grpc.pb.cc:6:
/data/mwrep/res/osp/Grpc/23-0-0-0/include/grpcpp/support/proto_buffer_reader.h: In member function 'virtual bool grpc::ProtoBufferReader::ReadCord(absl::lts_20230125::Cord*, int)': /data/mwrep/res/osp/Grpc/23-0-0-0/include/grpcpp/support/proto_buffer_reader.h:157:24: error: comparison of integer expressions of different signedness: 'uint64_t' {aka 'long unsigned int'} and 'int' [-Werror=sign-compare]
157 | if (slice_length <= count) {
| ~~~~~~~~~~~~~^~~~~~~~
/data/mwrep/res/osp/Grpc/23-0-0-0/include/grpcpp/support/proto_buffer_reader.h: In lambda function:
/data/mwrep/res/osp/Grpc/23-0-0-0/include/grpcpp/support/proto_buffer_reader.h:191:35: warning: unused parameter 'view' [-Wunused-parameter]
191 | [slice](absl::string_view view) { grpc_slice_unref(slice); });
| ~~~~~~~~~~~~~~~~~~^~~~
cc1plus: all warnings being treated as errors
```
Adds experimental fork support to gRPC/Ruby
Works towards https://github.com/grpc/grpc/issues/8798 (see caveats for why this wasn't marked fixed yet)
Works towards https://github.com/grpc/grpc/issues/33578 (see caveats for why this wasn't marked fixed yet)
This leverages existing `pthread_atfork` based C-core support for
forking that python/php use, but there's a bit extra involved mainly
because gRPC/Ruby has additional background threads.
New tests under `src/ruby/end2end` show example usage.
Based on https://github.com/grpc/grpc/pull/33495
Caveats:
- Bidi streams are not yet supported (bidi streams spawn background
threads which are not yet fork safe)
- Servers not supported
- Only linux supported
tcp_posix_test is incorrectly assuming that all endpoint_writes with
timestamps enabled will be successfully traced. Remove the timestamps
checking related tests to prevent flakes when the test is enabled
internally.
Most of the time parsing succeeds, and only rarely do we see an error.
This change reduces the parse memento size from 120 bytes to 56 bytes.
---------
Co-authored-by: ctiller <ctiller@users.noreply.github.com>
Remove `src/ruby/ext/grpc/objs/` and strip `src/ruby/ext/grpc/libs/` in
ruby build dir to save some space.
This can reduce the grpc gem size from more than 1G to about 70~80M.
This only affects opt build, dbg build remains the same.
A proper fix maybe use the same makefile to build c-core and the
extension rather than building c-core when generating the extension
makefile.
fixes: https://github.com/grpc/grpc/issues/33412
I'd been adding the following stanza regularly to debug flakes/fuzz
failures:
```
Expect(1, CoreEnd2endTest::MaybePerformAction{[&](bool success) {
Crash(absl::StrCat(
"Unexpected completion of client side call: success=",
success ? "true" : "false", " status=", server_status.ToString(),
" initial_md=", server_initial_metadata.ToString()));
}});
```
it was helpful because it indicated why a call batch finished
successfully and helped quickly identify next steps.
It occurred to me however that this would better be done inside of the
framework, and for *all* ops that have outputs, so this PR does just
that. Any time a batch with an op that outputs information finishes
successfully but unexpectedly we now display those outputs in human
readable form in the error message.
Sample output:
```
[ RUN ] CorpusExamples/FuzzerCorpusTest.RunOneExample/0
RUN TEST: Http2SingleHopTest.SimpleDelayedRequestShort/Chttp2SimpleSslFullstack
E0101 00:00:05.000000000 396633 simple_delayed_request.cc:37] Create client side call
E0101 00:00:05.000000000 396633 simple_delayed_request.cc:41] Start initial batch
E0101 00:00:05.000000000 396633 simple_delayed_request.cc:47] Start server
E0101 00:00:05.000000000 396633 cq_verifier.cc:364] Verify tag(101)-✅ for 600000ms
test/core/end2end/cq_verifier.cc:316: Unexpected event: OP_COMPLETE: tag:0x1 OK
with:
incoming_metadata: {}
status_on_client: status=4 msg=Deadline Exceeded trailing_metadata={}
checked @ test/core/end2end/tests/simple_delayed_request.cc:51
expected:
test/core/end2end/tests/simple_delayed_request.cc:50: tag(101) success=true
```
The intuition here is that these strings may end up in the hpack table,
and then unnecessarily extend the lifetime of the read blocks.
Instead, take a copy of these short strings when we need to and allow
the incoming large memory object to be discarded.
---------
Co-authored-by: ctiller <ctiller@users.noreply.github.com>
PanCakes to the rescue!
We noticed that our 'sanity' test was going to fail, but we think we can
fix that automatically, so we put together this PR to do just that!
If you'd like to opt-out of these PR's, add yourself to NO_AUTOFIX_USERS
in .github/workflows/pr-auto-fix.yaml
Co-authored-by: ctiller <ctiller@users.noreply.github.com>
Noticed this failing on an internal cl due to deadline exceeded errors.
<!--
If you know who should review your pull request, please assign it to
that
person, otherwise the pull request would get assigned randomly.
If your pull request is for a specific language, please add the
appropriate
lang label.
-->