<!--
If you know who should review your pull request, please assign it to
that
person, otherwise the pull request would get assigned randomly.
If your pull request is for a specific language, please add the
appropriate
lang label.
-->
---------
Co-authored-by: ctiller <ctiller@users.noreply.github.com>
This change mostly aims to get OpenCensus to use the new
ServerCallTracer interface. Note that the interfaces nor the code are in
their final states. There are a bunch of moving pieces, but I thought
this might be a nice mid-step to check-in and make sure that our
internal traces can also work with these changes.
Overall changes -
1) call_tracer.h shows what the hierarchy of new call tracer interfaces
looks like. Open to renaming suggestions.
2) Moved most of the common interface between `CallAttemptTracer` and
`ServerCallTracer` into a common `CallTracerInterface`. We should be
able to eventually move `RecordReceivedTrailingMetadata` and `RecordEnd`
as well to these common interfaces, but it requires some additional
work.
3) The compression filter is now responsible for recording the recv and
send messages for both the subchannel call and the server, and adds in
ability to record compressed and decompressed messages as well.
4) The OpenCensus server filter now uses the new `ServerCallTracer`
interface, and so doesn't need to be a filter anymore.
5) A new ServerCallTracerFilter was added. Ideally, we should be able to
move it to the current connected filter, but it is in a bit of an
interesting state right now, so I would prefer making those changes in a
separate PR with Craig's eyes on it.
6) A new context element `GRPC_CONTEXT_CALL_TRACER_ANNOTATION_INTERFACE`
was created that replaces the old `GRPC_CONTEXT_CALL_TRACER`, and the
new `GRPC_CONTEXT_CALL_TRACER` is mainly to pass the `CallAttemptTracer`
down the stack. This should go away in the new promise-based world.
<!--
If you know who should review your pull request, please assign it to
that
person, otherwise the pull request would get assigned randomly.
If your pull request is for a specific language, please add the
appropriate
lang label.
-->
<!-- Reviewable:start -->
- - -
This change is [<img src="https://reviewable.io/review_button.svg"
height="34" align="absmiddle"
alt="Reviewable"/>](https://reviewable.io/reviews/grpc/grpc/32618)
<!-- Reviewable:end -->
<!--
If you know who should review your pull request, please assign it to
that
person, otherwise the pull request would get assigned randomly.
If your pull request is for a specific language, please add the
appropriate
lang label.
-->
It is reported in https://github.com/grpc/grpc/issues/32356 that there
is a race on vptr for `UnimplementedAsyncRequest` which would cause
crashes for multi-threaded server if clients send unimplemented RPC
request to the server.
The cause is that the server requests a call for
`UnimplementedAsyncRequest` in its base class `GenericAsyncRequest` when
the `vptr` still points to the base class's `vtable`. If the call went
in and another server thread picks up the tag before the `vptr` points
back to the derived class's `vtable`, it would call the wrong virtual
function and also this is a data race. This fix makes the request of the
call inside the derived class's constructor.
<!--
If you know who should review your pull request, please assign it to
that
person, otherwise the pull request would get assigned randomly.
If your pull request is for a specific language, please add the
appropriate
lang label.
-->
<!--
If you know who should review your pull request, please assign it to
that
person, otherwise the pull request would get assigned randomly.
If your pull request is for a specific language, please add the
appropriate
lang label.
-->
<!--
If you know who should review your pull request, please assign it to
that
person, otherwise the pull request would get assigned randomly.
If your pull request is for a specific language, please add the
appropriate
lang label.
-->
First step in the modernization of our RBE stack (see
go/rbe-tech-debt-notes).
- Get rid of the deprecated rbe_autoconfig and start using
[rbe_configs_gen](https://github.com/bazelbuild/bazel-toolchains#rbe_configs_gen---cli-tool-to-generate-configs)
+ check in the generated toolchain configs.
- Switch from marketplace.gcr.io/google/rbe-ubuntu16-04 to
marketplace.gcr.io/google/rbe-ubuntu18-04 (this image is still not owned
by us, but at least it's newer and demonstrates how a switch to a newer
docker image is done).
- provide script for generating the linux RBE toolchain configs.
- cleanup RBE configuration in the bazelrc files used for remote build
This check only works if all handshake RPCs have an OK status, and it's
racey e.g. if the client is cancelling handshake RPCs (being when an RPC
is cancelled, termination of the RPC at the client is asynchronous from
termination at the server, so the client can resume the queue before the
server RPC completes).
With iomgr, this test is effectively rate limited by ExecCtx and the
single thread running pollset_work, which results in thousands of tiny
writes happening before every read. A small set of _synchronous_ 8k
reads then dominate the read-side of the test. This is an efficient
balance.
With the Windows EventEngine, the fully asynchronous, multi-threaded
reads and writes end up alternating roughly 1:1, meaning that a read
callback is executed for every tiny handful of bytes, tens of thousands
of times. Compared to the Posix EventEngine, without things like TCP_INQ
and/or recvmsg's timeout, I don't know of any great signal for how much
data can safely be received in a batch (e.g., we don't want to wait for
data that will never come, and we don't want to run callbacks for 2
bytes over and over again if we have KB in the pipe).
I believe the Windows EventEngine is WAI. I can significantly improve
this test performance by artificially slowing the reader down (adding a
>= 1ms sleep), but I believe that improves this use case to the
detriment of all others.
<!--
If you know who should review your pull request, please assign it to
that
person, otherwise the pull request would get assigned randomly.
If your pull request is for a specific language, please add the
appropriate
lang label.
-->
To support TPC feature for BYOID (3PI), we need to remove the validation
the pattern of impersonation endpoints, sts endpoints and token info
endpoints since they are different in TPC regions.
A security review is already passed at b/261634871
<!--
If you know who should review your pull request, please assign it to
that
person, otherwise the pull request would get assigned randomly.
If your pull request is for a specific language, please add the
appropriate
lang label.
-->
PR #32215 added the verified root cert subject to the lower level
`tsi_peer`. This PR is a companion to that and completes the feature by
bubbling the information up to the `TsiCustomVerificationCheckRequest`
which is part of the user facing API for implementing custom
verification callbacks.
Discovered via `bazel test
--test_env=GRPC_EXPERIMENTS=event_engine_client
//test/core/iomgr:endpoint_pair_test`. CI experiments can be enabled
generally on Windows once a few fixes and improvements are completed.
There are potentially surprising deployment bugs that can cause `EMFILE`
to be hit. For example, file descriptor limits can be easily reached if
- the round robin LB policy is used
- the load balancer hands out an assignment with a lot of backends
- using debian's default 1024 file descriptor limit.
To make such problems more apparent, we can pay special attention to
this error and log ERROR when it happens.
Related: b/265199104
Third try for #32466.
This adds an interop client / server for GCP Observability integration
testing.
Everything is new here with no refactor. Plan is to get this in first
before trying to refactor out the flags.
Avoids some compilation problems on older MSVC's, opens the door for
some future optimizations.
---------
Co-authored-by: ctiller <ctiller@users.noreply.github.com>
Refactor C++ interop test client flags into the common
`client_helper.h/cc`. This is needed by the observability testing PR
#32466
We need the `ABSL_DECLARE_FLAG` in the header file so that we can share
that across different implementation.
<!--
If you know who should review your pull request, please assign it to
that
person, otherwise the pull request would get assigned randomly.
If your pull request is for a specific language, please add the
appropriate
lang label.
-->
Alongside https://github.com/grpc/grpc/pull/32496, this makes this test
behave the same on all platforms.
FWIW, I verified this causes us to see the previous lock cycle problem
in https://github.com/grpc/grpc/pull/32491 on linux - originally that
lock cycle was only on mac, because of environmental differences between
mac and linux in CI.
<!--
If you know who should review your pull request, please assign it to
that
person, otherwise the pull request would get assigned randomly.
If your pull request is for a specific language, please add the
appropriate
lang label.
-->
This compiles for //:grpc, but not for tests yet.
It's the right approach though - @veblush hoping this is something you
can pick up and finish off.
<!--
If you know who should review your pull request, please assign it to
that
person, otherwise the pull request would get assigned randomly.
If your pull request is for a specific language, please add the
appropriate
lang label.
-->
This filter was originally written only for the C++ wrapped layer, but
we have plans to use this for Python (and maybe other wrapped languages
too in the future.)
<!--
If you know who should review your pull request, please assign it to
that
person, otherwise the pull request would get assigned randomly.
If your pull request is for a specific language, please add the
appropriate
lang label.
-->
Looks like this was accidentally dropped from our build files in
https://github.com/grpc/grpc/pull/21929, which means that this test
hasn't actually been built or run in almost 3 years. Unsurprisingly
after all that time, I had to make some changes to the test to get it to
actually build.
I've replaced all use of `InternalError` here because none of these
scenarios would necessarily merit a bug or outage report.
Identified in the fuchsia test suite: calling the Listener's
`on_shutdown` method with anything other than `absl::OkStatus()` would
fail some assertions in the Posix-specialized client test suite if the
Oracle were implemented similarly. It _should_ fail the same way in the
listener test suite, but the statuses are ignored. I've fixed that.
<!--
If you know who should review your pull request, please assign it to
that
person, otherwise the pull request would get assigned randomly.
If your pull request is for a specific language, please add the
appropriate
lang label.
-->
I had some doubts about `Seq` debugging another problem, so expanded the
tests we have to try and isolate the problem (so far without success, so
I think the original problem was elsewhere).
<!--
If you know who should review your pull request, please assign it to
that
person, otherwise the pull request would get assigned randomly.
If your pull request is for a specific language, please add the
appropriate
lang label.
-->
For stats, the StackDriver/OpenCensus API allows setting the
MonitoredResource directly, so use that.
For tracing, there is no explicit MonitoredResource to use, so just
insert it into the attributes for a span.
<!--
If you know who should review your pull request, please assign it to
that
person, otherwise the pull request would get assigned randomly.
If your pull request is for a specific language, please add the
appropriate
lang label.
-->
Enforce a minimum value for the `refresh_interval_sec_` for the
`FileWatcherCertificateProvider`. There have been issues found when this
is set to 0, and the security team discussed and agreed that 0 should
not be a valid value for this use-case.
I made the `refresh_interval_sec_` public to make it easy to test - I
didn't immediately see an easy way around this. I found `FRIEND_TEST`
exists for accessing private members, but I didn't see that used
anywhere in grpc. If there is a better solution to this, please let me
know.
This test is flaky only with iomgr, this fix will likely fix this.
<!--
If you know who should review your pull request, please assign it to
that
person, otherwise the pull request would get assigned randomly.
If your pull request is for a specific language, please add the
appropriate
lang label.
-->
Relands #32385 (reverted in #32419) with fixes.
The Windows build is clean on a test cherrypick: cl/511291828
<!--
If you know who should review your pull request, please assign it to
that
person, otherwise the pull request would get assigned randomly.
If your pull request is for a specific language, please add the
appropriate
lang label.
-->
---------
Co-authored-by: drfloob <drfloob@users.noreply.github.com>
Return `Timeout(kMaxHours, Unit::kHours)` if the value is about to
overflow in `DivideRoundingUp`.
<!--
If you know who should review your pull request, please assign it to
that
person, otherwise the pull request would get assigned randomly.
If your pull request is for a specific language, please add the
appropriate
lang label.
-->
The `XdsFaultInjectionMaxFault` test has seen a few flakes since #32326
was merged. I believe the flakiness is caused by the fact that when a
large number of RPCs are queued up before the resolver result comes in,
those RPCs are now re-processed in parallel instead of sequentially,
which can cause us to delay more RPCs than we should due to the
`max_faults` setting. To fix this, we change the test to ensure that the
channel is connected (i.e., the resolver result has already been
returned) before we start sending a large number of concurrent RPCs.
Although this is the only test that I've seen flakes in, I've made this
same change consistently to all fault injection tests that are creating
a large number of concurrent RPCs, since the same flake could affect any
of them.