This is a prerequisite change to start supporting Bazel 7. Changes are
- Disabled bzlmod which Bazel 7 begins to enable by default. This eventually needs to be done to support bzlmod but not now.
- Upgraded some bazel rule dependencies which are required to support Bazel 7.
- Using Python 3 explcitly as Bazel 7 begins to reject Python 2.
Note that this isn't enough to enable Bazel 7 by default and another PR will follow for that.
Closes#35374
PiperOrigin-RevId: 592931675
This commit upgrades gRPC to protobuf v25.0 and makes some fixes to
account for upb changes. One major change is that upb has been merged
into the protobuf repo, so we can now drop the separate `@upb`
dependency. Another is that `.upb.c` files no longer exist and there are
new `.upb_minitable.h` and `.upb_minitable.c` files. The longer
filenames exceeded a Windows restriction, so to work around that I
renamed the `upb-generated` directory to just `upb-gen`, and likewise
for `upbdefs-generated`.
Add a config to experiments & rollouts to allow dependent experiments to
be flagged.
We're getting past the point where it's possible to reason about which
experiments need to be turned off if we disable some other experiment,
and so this provides some additional rollout safety.
Can be specified in both experiments and rollouts: experiments.yaml
makes the most sense and we should default to it, but rollouts.yaml lets
us put dependencies between internal & external dependencies internally
and that's gonna be a little useful.
---------
Co-authored-by: ctiller <ctiller@users.noreply.github.com>
Add TcpTracer interface for TCP instrumentation. It takes no gRPC
dependencies for use in external TCP implementations. Also add
HttpAnnotation for HTTP transport instrumentation using CallTracer.
<!--
If you know who should review your pull request, please assign it to
that
person, otherwise the pull request would get assigned randomly.
If your pull request is for a specific language, please add the
appropriate
lang label.
-->
Cancel streams if we have too much concurrency on a single channel to
allow that server to recover.
There seems to be a convergence in the HTTP2 community about this being
a reasonable thing to do, so I'd like to try it in some real scenarios.
If this pans out well then I'll likely drop the
`red_max_concurrent_streams` and the `rstpit` experiments in preference
to this.
I'm also considering tying in resource quota so that under high memory
pressure we just default to this path.
---------
Co-authored-by: ctiller <ctiller@users.noreply.github.com>
Noticed that our behavior conflicts with what's mandated:
https://www.rfc-editor.org/rfc/rfc9113.html#section-5.1.2
So here's a quick fix - I like it because it's not a disconnection,
which always feels like one outage avoided.
---------
Co-authored-by: ctiller <ctiller@users.noreply.github.com>
Instead of fixing a target size for writes, try to adapt it a little to
observed bandwidth.
The initial algorithm tries to get large writes within 100-1000ms
maximum delay - this range probably wants to be tuned, but let's see.
The hope here is that on slow connections we can not back buffer so much
and so when we need to send a ping-ack it's possible without great
delay.
Experiment 1: On RST_STREAM: reduce MAX_CONCURRENT_STREAMS for one round
trip.
Experiment 2: If a settings frame is outstanding with a lower
MAX_CONCURRENT_STREAMS than is configured, and we receive a new incoming
stream that would exceed the new cap, randomly reject it.
---------
Co-authored-by: ctiller <ctiller@users.noreply.github.com>
Cap requests per read, rst_stream handled per read.
If these caps are exceeded, offload processing of the connection to a
backing thread pool, and allow other connections to make progress.
Previously chttp2 would allow infinite requests prior to a settings ack
- as the agreed upon limit for requests in that state is infinite.
Instead, after MAX_CONCURRENT_STREAMS requests have been attempted,
start blanket cancelling requests until the settings ack is received.
This can be done efficiently without allocating request state
structures.
We originally wanted a common format with internal OWNERS files on the
thought that we'd use them and import them and oh gosh that was the
wrong thing to think.
With upcoming changes to tree management having them here is going to
cause problems, so lets just update the CODEOWNERS files directly.
Isolate ping callback tracking to its own file.
Also takes the opportunity to simplify keepalive code by applying the
ping timeout to all pings.
Adds an experiment to allow multiple pings outstanding too (this was
originally an accidental behavior change of the work, but one that I
think may be useful going forward).
---------
Co-authored-by: ctiller <ctiller@users.noreply.github.com>
We disabled this a little while ago for lack of CI bandwidth, but #34404
ought to have freed up enough capacity that we can keep running this.
---------
Co-authored-by: ctiller <ctiller@users.noreply.github.com>
Summary -
On the server-side, we are changing the point at which we decide whether
a method is registered or not from the surface to the transport at the
point where we are done receiving initial metadata and before we invoke
the recv_initial_metadata_ready closures from the filters. The main
motivation for this is to allow filters to check whether the incoming
method is a registered or not. The exact use-case is for observability
where we only want to record the method if it is registered. We store
the information about the registered method in the initial metadata.
On the client-side, we also set information about whether the method is
registered or not in the outgoing initial metadata.
Since we are effectively changing the lookup point of the registered
method, there are slight concerns of this being a potentially breaking
change, so we are guarding this with an experiment to be safe.
Changes -
* Transport API changes -
* Along with `accept_stream_fn`, a new callback
`registered_method_matcher_cb` will be sent down as a transport op on
the server side. When initial metadata is received on the server side,
this callback is invoked. This happens before invoking the
`recv_initial_metadata_ready` closure.
* Metadata changes -
* We add a new non-serializable metadata trait `GrpcRegisteredMethod()`.
On the client-side, the value is a uintptr_t with a value of 1 if the
call has a registered/known method, or 0, if it's not known. On the
server side, the value is a (ChannelRegisteredMethod*). This metadata
information can be used throughout the stack to check whether a call is
registered or not.
* Server Changes -
* When a new transport connection is accepted, the server sets
`registered_method_matcher_cb` along with `accept_stream_fn`. This
function checks whether the method is registered or not and sets the
RegisteredMethod matcher in the metadata for use later.
* Client Changes -
* Set the metadata on call creation on whether the method is registered
or not.
Working towards testing against CSM Observability. Added ability to
register a prometheus exporter with our Opentelemetry plugin. This will
allow our metrics to be available at the standard prometheus port
`:9464`.
We have a bunch of experiments testing against core e2e - and this is
good for robustness, bad for CI times.
We also have a bunch of marginal but overall necessary fixtures in the
e2e suites - again good for robustness, bad for CI times.
We can eliminate some of the cross product though, and I think safely:
run experiments on a broad range of suites, but not *ALL* the suites,
and get a bunch of our CI time back.
Here I introduce an environment variable: `GRPC_CI_EXPERIMENTS` that's
set when running bazel @experiment= configs, cleared otherwise (so we
can still execute those tests directly when necessary). When that env
var is set we filter out a bunch of suites from the test configurations.