follow up from #33590
<!--
If you know who should review your pull request, please assign it to
that
person, otherwise the pull request would get assigned randomly.
If your pull request is for a specific language, please add the
appropriate
lang label.
-->
We are seeing `g++: fatal error: Killed signal terminated program
cc1plus` on PHP distribtest builds. In case it's an OOM, let's try
reducing the build parallelism to see if it helps.
Promises code can prevent these bad requests from even reaching the
application, which is beneficial but this test needs a minor update to
handle it.
---------
Co-authored-by: ctiller <ctiller@users.noreply.github.com>
This test has been disabled for a long time now due to flakiness, but
it's now causing problems with the import. And stress tests don't
provide positive ROI anyway, so let's just get rid of it.
Based on the discussion at:
595a75cc5d..e3b402a8fa (r1244325752)
<!--
If you know who should review your pull request, please assign it to
that
person, otherwise the pull request would get assigned randomly.
If your pull request is for a specific language, please add the
appropriate
lang label.
-->
There is a bug in the SSL stack that was only partially fixed in #29176:
if more than 17kb is written to the BIO buffer, then everything over
17kb will be discarded, and the SSL handshake will fail with a bad
record mac error or hang if not enough bytes have arrived yet.
It's relatively uncommon to hit this bug, because the TLS handshake
messages need to be much larger than normal for you to have a chance of
hitting this bug. However, there was a separate bug in the SSL stack
(recently fixed in #33558) that causes the ServerHello produced by a
gRPC-C++ TLS server to grow linearly in size with the size of the trust
bundle; these 2 bugs combined to cause a large number of TLS handshake
failures for gRPC-C++ clients talking to gRPC-C++ servers when the
server had a large trust bundle.
This PR fixes the bug by ensuring that all bytes are successfully
written to the BIO buffer. An initial quick fix for this bug was planned
in #33611, but abandoned because we were worried about temporarily
doubling the memory footprint of all SSL channels.
The complexity in this PR is mostly in the test: it is fairly tricky to
force gRPC-C++'s SSL stack to generate a sufficiently large ServerHello
to trigger this bug.
Going forward `[[nodiscard]]` is the portable way to spell this;
requires yanking a bunch of usage from after the param list to before.
We should further refine the GRPC_MUST_USE_RESULT macro to make it work
uniformly for any compilers that it doesn't today (most likely by making
it expand to nothing).
---------
Co-authored-by: ctiller <ctiller@users.noreply.github.com>
PanCakes to the rescue!
We noticed that our 'sanity' test was going to fail, but we think we can
fix that automatically, so we put together this PR to do just that!
If you'd like to opt-out of these PR's, add yourself to NO_AUTOFIX_USERS
in .github/workflows/pr-auto-fix.yaml
Co-authored-by: Vignesh2208 <Vignesh2208@users.noreply.github.com>
URI should be resolved to unix abstract only if the first byte is NULL
but the path length is > 1 (i.e there is more than one byte in the
path). Otherwise treat is as unix URI.
Moreover, when the AsyncConnectionAcceptor is destroyed, if it was
managing a unix domain socket, then the socket should be unlinked to
ensure we dont leave the underlying file hanging around forever.
This (or something like it) is necessary for the automated protobuf
upgrade script, to ensure we don't commit a redundant third_party tree
to the repo. Here is an example of the protobuf upgrade without this
fix: https://github.com/grpc/grpc/pull/33625
One alternative could be adding the filepaths to `.gitignore`.
Spun off from https://github.com/grpc/grpc/pull/33695
Note that the plugin is still under `grpc::internal` namespace and not
under `experimental` intentionally.
<!--
If you know who should review your pull request, please assign it to
that
person, otherwise the pull request would get assigned randomly.
If your pull request is for a specific language, please add the
appropriate
lang label.
-->
<!--
If you know who should review your pull request, please assign it to
that
person, otherwise the pull request would get assigned randomly.
If your pull request is for a specific language, please add the
appropriate
lang label.
-->
It has been decided that we will no longer try and split Grpc.Tools
NuGet package logic into two packages by adding logic to
Google.Protobuf.Tools. See
https://github.com/protocolbuffers/protobuf/issues/5957
This PR removes some of the TODOs and redundant files.
There is further simplification work that could be done to combine the
props and targets files that were already split. But this is the first
step in the tidying up the code.
This reverts the following PRs: #32692#33087#33093#33427#33568
These changes seem to have introduced some flaky crashes. Reverting
while I investigate.
PanCakes to the rescue!
We noticed that our 'sanity' test was going to fail, but we think we can
fix that automatically, so we put together this PR to do just that!
If you'd like to opt-out of these PR's, add yourself to NO_AUTOFIX_USERS
in .github/workflows/pr-auto-fix.yaml
Co-authored-by: ctiller <ctiller@users.noreply.github.com>
This was done manually due to a problem with
`tools/distrib/python/make_grpcio_tools.py`. ~I fixed it in this PR
(depends on cl/547979185), so there is a fair chance this upgrade will
work normally for the next release.~ The fix may be problematic for
upgrading protobuf on older release branches, so the improvement will be
worked on separately. CC @jtattermusch
This also updates the UPB dep to the latest commit on the 23.x branch.
As per gRFC A58, when WRR sees a subchannel report READY, it reset the
non_empty_since value, thus restarting the blackout period. However,
there were two cases where we were incorrectly triggering this code:
1. When WRR got an updated address list that contained addresses that
were already present on the old list and whose subchannels were already
in READY state, the initial notification for those subchannels on the
new list was READY, which incorrectly triggered resetting the
non_empty_since value.
2. Due to a bug in the outlier_detection policy, whenever an update was
propagated down through the OD policy without actually enabling OD, it
would incorrectly send a duplicate connectivity state notification for
the subchannels. This meant that a subchannel that was already in state
READY would report READY again, which would also incorrectly trigger
resetting the non_empty_since value.
This PR makes two changes:
1. Fix the bug in outlier_detection that caused it to generate the
spurious duplicate READY updates.
2. Fix WRR to reset the non_empty_since value when a subchannel goes
READY only if the subchannel has seen a previous state update and only
if that previous state was not READY. (The duplicate READY notifications
should not actually happen anymore now that the OD policy has been
fixed, but better to be defensive.)
Fixes b/290983884.
Makes the changes necessary to run the new PSM interop framework on
Ubuntu 22.04:
- Install dependencies via apt: `kubectl`,
`google-cloud-sdk-gke-gcloud-auth-plugin` (previously were
pre-provisioned or available as part of gcloud distribution)
- Use venv instead of pyenv
- Use python 3.10 instead of python3 .9
Other changes:
- Do not update gcloud components - the one provisioned is relatively
recent, and expected to be updated as new base images are released
- Unpin pip from `21.0.1`. Not sure if we're OK about using the latest
one `venv --upgrade-deps`, or if we should just pin it to something more
recent (currently it resolves to `pip 22.0.2` and `setuptools 68.0.0`)
ref b/274944592, cl/547690787