Fix a crash on older iOS versions due to problematic thread-local
variable initialization.
See https://github.com/firebase/firebase-ios-sdk/issues/11509
Basically, there appears to be a bug in Xcode where it generates
assembly for thread-local variable initialization that is susceptible to
a crash. For example, on arm64 the generated assembly relies on
registers like x8 and x10 being preserved by the thread-local variable
initialization routine; however, in some cases this thread-local
variable initialization calls functions like
`ImageLoaderMachOCompressed::doBindFastLazySymbol` which clobber these
registers, leaving their values indeterminate when the caller resumes.
When those indeterminate values are later used as memory addresses they
are invalid and result in a crash.
This PR works around this bug by removing the `ScopedTimeCache` member
variable from the `ExecCtx` class on iOS. This is a reasonable
workaround because `ScopedTimeCache` is only a slight optimization for
data centers that entirely doesn't matter for mobile.
See https://github.com/dconeybe/TlsCrashIos12 for a demo of this crash.
Googlers see b/300501963 for full details.
We disabled this a little while ago for lack of CI bandwidth, but #34404
ought to have freed up enough capacity that we can keep running this.
---------
Co-authored-by: ctiller <ctiller@users.noreply.github.com>
Expand our fuzzing capabilities by allowing fuzzers to choose the bits
that go into random number distribution generators.
---------
Co-authored-by: ctiller <ctiller@users.noreply.github.com>
Passed manual run:
https://fusion2.corp.google.com/invocations/0e337438-8573-4b54-ba1c-242b1ad58586
<!--
If you know who should review your pull request, please assign it to
that
person, otherwise the pull request would get assigned randomly.
If your pull request is for a specific language, please add the
appropriate
lang label.
-->
There's an ongoing discussion on whether we should have API to disable
default metrics. Removing this API till we have a decision on that.
I'm keeping the internal API for enabling/disabling metrics on the OTel
plugin for now, just not exposing it publicly.
Reverts grpc/grpc#34361
Not sure why CI didn't capture it and why my previous test works, but
looks like it broke at least src/objective-c/examples/SwiftSample build.
Will see how CI goes in this PR.
Summary -
On the server-side, we are changing the point at which we decide whether
a method is registered or not from the surface to the transport at the
point where we are done receiving initial metadata and before we invoke
the recv_initial_metadata_ready closures from the filters. The main
motivation for this is to allow filters to check whether the incoming
method is a registered or not. The exact use-case is for observability
where we only want to record the method if it is registered. We store
the information about the registered method in the initial metadata.
On the client-side, we also set information about whether the method is
registered or not in the outgoing initial metadata.
Since we are effectively changing the lookup point of the registered
method, there are slight concerns of this being a potentially breaking
change, so we are guarding this with an experiment to be safe.
Changes -
* Transport API changes -
* Along with `accept_stream_fn`, a new callback
`registered_method_matcher_cb` will be sent down as a transport op on
the server side. When initial metadata is received on the server side,
this callback is invoked. This happens before invoking the
`recv_initial_metadata_ready` closure.
* Metadata changes -
* We add a new non-serializable metadata trait `GrpcRegisteredMethod()`.
On the client-side, the value is a uintptr_t with a value of 1 if the
call has a registered/known method, or 0, if it's not known. On the
server side, the value is a (ChannelRegisteredMethod*). This metadata
information can be used throughout the stack to check whether a call is
registered or not.
* Server Changes -
* When a new transport connection is accepted, the server sets
`registered_method_matcher_cb` along with `accept_stream_fn`. This
function checks whether the method is registered or not and sets the
RegisteredMethod matcher in the metadata for use later.
* Client Changes -
* Set the metadata on call creation on whether the method is registered
or not.
Bumping gcc 7 to 8 to workaround the ongoing gcc segfault problem when
building Protobuf C++. Currently Foundational C++ requires gcc 7 so this
is a temporary measure to make the test green. We need to either make a
decision to change the minimum version of gcc in the Foundational C++ or
find a way to support gcc 7 without gcc segfault soon.
Changes -
* CsmObservability doesn't need `SetTargetSelector`. Removed it.
* Added missing plumbing of `ServiceMeshLabelsInjector` in
`CsmObservability` to actually do the metadata exchange.
<!--
If you know who should review your pull request, please assign it to
that
person, otherwise the pull request would get assigned randomly.
If your pull request is for a specific language, please add the
appropriate
lang label.
-->
Changes -
* Use `grpc-c++` as the meter name and the proper version string when
creating the meter for OTel
* setting metric description and unit. (Pointed out by @DNVindhya )
<!--
If you know who should review your pull request, please assign it to
that
person, otherwise the pull request would get assigned randomly.
If your pull request is for a specific language, please add the
appropriate
lang label.
-->
Lets us sever the dependency between stats & exec ctx (finally).
More work likely needs to go into the *mechanism* used here (I'm not a
fan of the per thread index), but that's also something we can address
later.
Support Python 3.12.
### Testing
* Passed all Distribution Tests.
* Also tested locally by installing 3.12 artifact.
<!--
If you know who should review your pull request, please assign it to
that
person, otherwise the pull request would get assigned randomly.
If your pull request is for a specific language, please add the
appropriate
lang label.
-->
The `cancel_check_peer()` method is [always called with a non-OK
status](866fc41067/src/core/lib/security/transport/security_handshaker.cc (L560)),
since it's used only in cancellation cases. However, the implementation
of this method for TLS creds was bailing out if the status was non-OK,
meaning that `cancel_check_peer()` was never actually cancelling the
verification request. This bug seems to have been introduced back in
#25631, when the method was initially implemented.
I don't think we actually have any async verifier implementations today,
so this isn't actually causing a problem. I discovered this bug as part
of #34426, which was triggering the core e2e `no_logging` test to fail.
That test is designed to ensure that we don't generate any logs while
processing individual RPCs, since that would be bad for performance and
would flood logfiles. My PR caused a connection attempt to be cancelled
during the test, which triggered the error log that I am removing in
this PR.
Note that with this PR, the TLS creds `cancel_check_peer()` methods are
not actually doing anything with the status. Ideally, they should be
passing the status through to the verifier's `Cancel()` method, but we
apparently didn't add a parameter for that, which means that although
cancellation will work now, it will not properly pass through the right
error message. At some point, we should fix this and add tests covering
cancellation of async verifier requests to prove that the error message
is propagated correctly.
Bazelify tests from "linux/grpc_bazel_build" kokoro job by creating 3
bazelified tests - "build with strict warning", "build with no_xds=True"
and "build with no_xds=True negative test".
- also make the original "linux/grpc_bazel_build" kokoro job a no-op
(since bazelified tests now provide the same coverage).
The deleted code here was overriding the
[intended](866fc41067/tools/run_tests/run_tests.py (L62))
default test env of `GRPC_VERBOSITY=DEBUG`.
I'm just deleting it because it looks like`GRPC_TRACE=api` is not having
any affect anyways, since it relies on `GRPC_VERBOSITY=DEBUG` which it
happens to be unsetting.
If the client calls LookupHostname again within the on_resolve callback,
it re-acquires the `request_mu_` before releasing it which results in
deadlock.
With this PR it extracts the request and releases the lock before
calling on_resolve callback so it won't deadlock any more.
https://github.com/grpc/grpc/pull/33538 added `-weak_framework
CoreFoundation` in `DLDFLAGS` for only `arm64-darwin` builds, but the
issue reported in https://github.com/grpc/grpc/issues/33483 can also
happen on `x86-darwin` builds. This can happen if:
1. The Ruby interpreter is compiled without
`-Wl,-undefined,dynamic_lookup`.
2. This happens if the Ruby interpreter is built with XCode 14.0 to 14.2
(https://bugs.ruby-lang.org/issues/19005).
Simplify the logic and always include `-weak_framework CoreFoundation`
for macOS builds.
Changes -
* Remove `csm.remote_workload_pod_name` and
`csm.remote_workload_container_name`.
* Add `csm.remote_workload_name`, the value for which is sent through
MetadataExchange, from the `CSM_WORKLOAD_NAME` env var. (Note that this
is not added in local labels.)
* Add a local `csm.canonical_service` (@markdroth, please verify the key
that we want here) that is read from `CSM_CANONICAL_SERVICE_NAME` env
var, and we continue to send it over via MetadataExchange
- make C-core basictests use `--build_only` when running as bazelified
tests. This is because the volume of C core tests is expected to grow
very significantly after https://github.com/grpc/grpc/pull/34419 and
currently the non-bazelified counterpart of the tests (the presubmit
grpc_basictests_c_cpp_build_only job) is also "build only".
- make the linux presubmit job `grpc_basictests_c_cpp_build_only` a
noop, since the bazelified tests already give the same coverage on
presubmit.
Revert the reversion of the SSL_CTX_new change (#34355 reverted #34180 )
with a fix.
There was an issue with using `strcpy` on a `new[] string` in the
constructor of `ssl_credentials`. An ASAN test caught this in some CI
down the line - `ERROR: AddressSanitizer: alloc-dealloc-mismatch
(operator new [] vs free)`
That `strcpy` call was changed to `grp_strdup` which duplicates a string
in a way that can be freed by `gpr_free` and should resolve the ASAN
failure.
<!--
If you know who should review your pull request, please assign it to
that
person, otherwise the pull request would get assigned randomly.
If your pull request is for a specific language, please add the
appropriate
lang label.
-->
Added a separate distribtests for gRPC C++ DLL build on Windows. This
DLL build is a community support so it should be independently run from
the existing Windows distribtests. Actual DLL test will be added.
We're seeing some reports of the ping abuse policy not working like it
ought... add some tracing here to debug.
---------
Co-authored-by: ctiller <ctiller@users.noreply.github.com>
This pull request adds another hook service on the maintenance server.
This will enable clients to gradually migrate from the standalone hook
server.
Changes:
1. Hook service can now be used separately.
2. Copied latest protos and updated the hook service to new API.
3. Added the hook service to the maintenance server.
Working towards testing against CSM Observability. Added ability to
register a prometheus exporter with our Opentelemetry plugin. This will
allow our metrics to be available at the standard prometheus port
`:9464`.
This fixes a deadlock seen when both the
`round_robin_delegate_to_pick_first` and
`client_channel_subchannel_wrapper_work_serializer_orphan` experiments
are enabled -- although I think the bug really has to do only with the
latter.
The problem here was that we were unreffing the picker while holding the
channel's LB mutex. Destroying the picker was triggering a
`SubchannelWrapper` to be orphaned, which triggered a hop into the
`WorkSerializer`. Once there, we were also running a queued subchannel
connectivity state notification, which triggered an update to the LB
policy, which triggered returning a new picker to the channel, which
tried to acquire the channel's LB mutex again.
Note that the `work_serializer_dispatch` experiment would have avoided
this problem.