<!--
If you know who should review your pull request, please assign it to
that
person, otherwise the pull request would get assigned randomly.
If your pull request is for a specific language, please add the
appropriate
lang label.
-->
Earlier, we were simply using a 64 bit random number, but the spec
actually calls for UUIDv4.
<!--
If you know who should review your pull request, please assign it to
that
person, otherwise the pull request would get assigned randomly.
If your pull request is for a specific language, please add the
appropriate
lang label.
-->
Fix the end timestamp of `grpc.io/client/roundtrip_latency` to when we
receive the trailers from the previous end timestamp of when we clean-up
the call attempt.
The reason to do this is to make sure that `grpc.io/client/api_latency`
(the end-to-end latency for the call) is always greater than
`grpc.io/client/roundtrip_latency` and fix the bug found by
@stanley-cheung
--
<br class="Apple-interchange-newline">
<!--
If you know who should review your pull request, please assign it to
that
person, otherwise the pull request would get assigned randomly.
If your pull request is for a specific language, please add the
appropriate
lang label.
-->
<!--
If you know who should review your pull request, please assign it to
that
person, otherwise the pull request would get assigned randomly.
If your pull request is for a specific language, please add the
appropriate
lang label.
-->
This PR adds the view `grpc.io/client/api_latency` for GCP Observability
which aims to collect the end-to-end time taken by a call.
Changes made to support this -
1) A global interceptor factory registration is created for stats
plugins.
2) OpenCensus plugin now provides a new interceptor that's responsible
for collecting the new latency.
3) Gcp Observability registers this plugin.
4) A new OpenCensus measurement and view is created for api latency.
Note that this is internal as of now, since it's not clear if it should
be exposed as public experimental API. Leaving that decision for the
future.
<!--
If you know who should review your pull request, please assign it to
that
person, otherwise the pull request would get assigned randomly.
If your pull request is for a specific language, please add the
appropriate
lang label.
-->
This PR adds annotations to client attempt spans and server spans on
messages of the form -
* `Send message: 1026 bytes`
* `Send compressed message: 31 bytes` (if message was compressed)
* `Received message: 31 bytes`
* `Received decompressed message: 1026 bytes` (if message needed to be
decompressed)
Note that the compressed and decompressed annotations are not present if
compression/decompression was not performed.
<!--
If you know who should review your pull request, please assign it to
that
person, otherwise the pull request would get assigned randomly.
If your pull request is for a specific language, please add the
appropriate
lang label.
-->
We currently take named metrics recorded per-request only. Instead we
should merge field-wise.
<!--
If you know who should review your pull request, please assign it to
that
person, otherwise the pull request would get assigned randomly.
If your pull request is for a specific language, please add the
appropriate
lang label.
-->
This change mostly aims to get OpenCensus to use the new
ServerCallTracer interface. Note that the interfaces nor the code are in
their final states. There are a bunch of moving pieces, but I thought
this might be a nice mid-step to check-in and make sure that our
internal traces can also work with these changes.
Overall changes -
1) call_tracer.h shows what the hierarchy of new call tracer interfaces
looks like. Open to renaming suggestions.
2) Moved most of the common interface between `CallAttemptTracer` and
`ServerCallTracer` into a common `CallTracerInterface`. We should be
able to eventually move `RecordReceivedTrailingMetadata` and `RecordEnd`
as well to these common interfaces, but it requires some additional
work.
3) The compression filter is now responsible for recording the recv and
send messages for both the subchannel call and the server, and adds in
ability to record compressed and decompressed messages as well.
4) The OpenCensus server filter now uses the new `ServerCallTracer`
interface, and so doesn't need to be a filter anymore.
5) A new ServerCallTracerFilter was added. Ideally, we should be able to
move it to the current connected filter, but it is in a bit of an
interesting state right now, so I would prefer making those changes in a
separate PR with Craig's eyes on it.
6) A new context element `GRPC_CONTEXT_CALL_TRACER_ANNOTATION_INTERFACE`
was created that replaces the old `GRPC_CONTEXT_CALL_TRACER`, and the
new `GRPC_CONTEXT_CALL_TRACER` is mainly to pass the `CallAttemptTracer`
down the stack. This should go away in the new promise-based world.
<!--
If you know who should review your pull request, please assign it to
that
person, otherwise the pull request would get assigned randomly.
If your pull request is for a specific language, please add the
appropriate
lang label.
-->
<!-- Reviewable:start -->
- - -
This change is [<img src="https://reviewable.io/review_button.svg"
height="34" align="absmiddle"
alt="Reviewable"/>](https://reviewable.io/reviews/grpc/grpc/32618)
<!-- Reviewable:end -->
<!--
If you know who should review your pull request, please assign it to
that
person, otherwise the pull request would get assigned randomly.
If your pull request is for a specific language, please add the
appropriate
lang label.
-->
It is reported in https://github.com/grpc/grpc/issues/32356 that there
is a race on vptr for `UnimplementedAsyncRequest` which would cause
crashes for multi-threaded server if clients send unimplemented RPC
request to the server.
The cause is that the server requests a call for
`UnimplementedAsyncRequest` in its base class `GenericAsyncRequest` when
the `vptr` still points to the base class's `vtable`. If the call went
in and another server thread picks up the tag before the `vptr` points
back to the derived class's `vtable`, it would call the wrong virtual
function and also this is a data race. This fix makes the request of the
call inside the derived class's constructor.
<!--
If you know who should review your pull request, please assign it to
that
person, otherwise the pull request would get assigned randomly.
If your pull request is for a specific language, please add the
appropriate
lang label.
-->
PR #32215 added the verified root cert subject to the lower level
`tsi_peer`. This PR is a companion to that and completes the feature by
bubbling the information up to the `TsiCustomVerificationCheckRequest`
which is part of the user facing API for implementing custom
verification callbacks.
This filter was originally written only for the C++ wrapped layer, but
we have plans to use this for Python (and maybe other wrapped languages
too in the future.)
<!--
If you know who should review your pull request, please assign it to
that
person, otherwise the pull request would get assigned randomly.
If your pull request is for a specific language, please add the
appropriate
lang label.
-->
<!--
If you know who should review your pull request, please assign it to
that
person, otherwise the pull request would get assigned randomly.
If your pull request is for a specific language, please add the
appropriate
lang label.
-->
This reverts commit 0fc0384b5a.
Major changes: this code calls `GetDefaultEventEngine` once on Alarm
init instead of 7 times throughout.
I will run benchmarks to ensure b/237283941 is not reproduced.
<!--
If you know who should review your pull request, please assign it to
that
person, otherwise the pull request would get assigned randomly.
If your pull request is for a specific language, please add the
appropriate
lang label.
-->
---------
Co-authored-by: drfloob <drfloob@users.noreply.github.com>
For stats, the StackDriver/OpenCensus API allows setting the
MonitoredResource directly, so use that.
For tracing, there is no explicit MonitoredResource to use, so just
insert it into the attributes for a span.
This PR adds batching support for GCP Observability logging. So instead
of the naive creating a new RPC to cloud logging for each logging event,
we now batch the log events to meet one of the following requirements -
* Batch size of 1000
* Batch memory consumption of 1MB
* A timeout period of 1sec after which we flush the accumulated batch
irrespective of the size.
There can also be cases where for some reason the RPCs fail or the batch
just accumulates to a very large size(100000 entries or 10MB in size).
In such cases, we just log the events with gpr_log instead of just
continuing to accumulate.
Additionally, `GcpObservabilityClose()` has been added to gracefully
shut off logging where we block till all the currently logged events are
flushed. (We might be able to gracefully shut off stats and tracing in
the future too.)
<!--
If you know who should review your pull request, please assign it to
that
person, otherwise the pull request would get assigned randomly.
If your pull request is for a specific language, please add the
appropriate
lang label.
-->
<!--
If you know who should review your pull request, please assign it to
that
person, otherwise the pull request would get assigned randomly.
If your pull request is for a specific language, please add the
appropriate
lang label.
-->
This code is not plumbed through yet, but it provides the core
infrastructure needed to detect the proper GCP environment resources
needed to set up the labels/attributes/resources for stats, tracing and
logging.
Details on how the various environment resources are setup has been
derived by looking at java's cloud logging library and OpenTelemetry's
future plans. (Could be better explained in an offline review since some
links are internal).
Requesting @veblush for a full review and @markdroth for a structural
review.
<!--
If you know who should review your pull request, please assign it to
that
person, otherwise the pull request would get assigned randomly.
If your pull request is for a specific language, please add the
appropriate
lang label.
-->
The upb team wants to remove this particular bit of syntactic sugar from
the generated code. So instead of calling has_foo() when foo is a map
field, we call foo_size() and test the result against zero.
<!--
If you know who should review your pull request, please assign it to
that
person, otherwise the pull request would get assigned randomly.
If your pull request is for a specific language, please add the
appropriate
lang label.
-->
Currently, the peer name is returned with the completion of the
send_initial_metadata op, which does not make sense, because with
retries, we don't actually know the peer name until we complete the
recv_initial_metadata op. This PR changes our code to return the peer
string as an attribute of the recv_initial_metadata op, so that it is
not available to the application until that point. This change may be
user-visible, but since our API docs don't seem to guarantee exactly
when this data will be available, it's not technically a breaking
change.
Note that in the promise-based stack, we were already assuming that the
peer string would be returned as part of the recv_initial_metadata
batch, so this PR helps reduce risk for the promise conversion by making
this semantic change now, thus decoupling it from the promise
conversion.
I have also changed the representation of the string in the metadata
batch to be a `grpc_core::Slice` instead of a `std::string`, so that we
can just take a ref to the string held in the transport instead of
having to copy the whole string for every call.
Cleanup and remove ios cpp test cronet
To test manually:
./tools/bazel test //src/objective-c/tests:CppCronetTests
@sampajano
<!--
If you know who should review your pull request, please assign it to
that
person, otherwise the pull request would get assigned randomly.
If your pull request is for a specific language, please add the
appropriate
lang label.
-->
<!--
If you know who should review your pull request, please assign it to
that
person, otherwise the pull request would get assigned randomly.
If your pull request is for a specific language, please add the
appropriate
lang label.
-->
* Revert "Revert "Revert "Revert "server: introduce ServerMetricRecorder API and move per-call reporting from a C++ interceptor to a C-core filter (#32106)" (#32272)" (#32279)" (#32293)"
This reverts commit 1f960697c5.
* Do not create CallMetricRecorder if call is null.
* [channel_args] Use c++ channel args during channel init
Previously we were converting to C and then back to C++ for each
filter... this ought to save some CPU time during connection
establishment.
* Automated change: Fix sanity tests
* cpp channel filters
* Automated change: Fix sanity tests
* iwyu
---------
Co-authored-by: ctiller <ctiller@users.noreply.github.com>
* Revert "Revert "server: introduce ServerMetricRecorder API and move per-call reporting from a C++ interceptor to a C-core filter (#32106)" (#32272)"
This reverts commit deb1e25543.
* Fix by caching call metric recording stuff in async request
PR #32106 caused msan errors in some tests while de-referencing the
server object where async calls are active after the server is
destroyed. Instead cache the ServerMetricRecorder pointer.
* copyright headers fixed
* clang fixes.
* Gcp Observability: Lazily initialize channels post-init
* IWYU and fix build deps
* Run RegistryPostInit for client census filters too
* Remove unused function