Add an event manager that spawns threads just as much as it possibly
can... to expose TSAN to the myriad thread ordering problems in our code
base.
Next steps for this will be to add a new test mode for tsan + thready
event engine + a few other doodads to increase threads in the system
(party.cc in particular has a good place for a hook).
<!--
If you know who should review your pull request, please assign it to
that
person, otherwise the pull request would get assigned randomly.
If your pull request is for a specific language, please add the
appropriate
lang label.
-->
---------
Co-authored-by: ctiller <ctiller@users.noreply.github.com>
(part of removing support for VS2017)
Also see https://github.com/grpc/grpc/pull/32649
Also see https://github.com/grpc/grpc/pull/32615
The switch to grpc-win2019 windows workers has already happened:
(cl/517400022).
Once this PR lands, I'll backport to 1.53.x branch as well (since that
release removes the VS2017 support).
This PR is a small code change with a lot of new test data.
[In OpenSSL, there are two flags that configure CRL checks. Coping
relevant
section:](https://www.openssl.org/docs/man1.0.2/man3/X509_VERIFY_PARAM_get_depth.html)
> - X509_V_FLAG_CRL_CHECK enables CRL checking for the certificate chain
leaf certificate. An error occurs if a suitable CRL cannot be found.
> - X509_V_FLAG_CRL_CHECK_ALL enables CRL checking for the entire
certificate chain.
We currently only set `X509_V_FLAG_CRL_CHECK`, so we will only ever
check if the leaf certificate is revoked. We should check the whole
chain. I am open to making this a user configuration if we want to do it
that way, but we certainly need to be able to check the whole chain.
So, this PR contains the small code change in
`ssl_transport_security.cc` to use the `X509_V_FLAG_CRL_CHECK_ALL` flag.
Then the rest of the changes are in tests. I've added all the necessary
files to have a chain built that looks as follows
`Root CA -> Revoked Intermediate CA -> Leaf Certificate`, and added a
test for this case as well.
You can verify that on master this new test will fail (i.e. the
handshake will succeed even though the intermediate CA is revoked) by
checking out this branch, running `git checkout master --
./src/core/tsi/ssl_transport_security.cc`, then running the test.
I also slightly reorganized test/core/tsi/test_creds/ so that the CRLs
are in their own directory, which is the way our API intends to accept
CRLs.
Added documentation on how to use `Grpc.Tools` MSBuild integration with
*unsupported* architectures - i.e. third party built `protoc` and
`grpc_csharp_plugin`.
Fixing TSAN data races of the kind -
https://source.cloud.google.com/results/invocations/c3f02253-0d0b-44e6-917d-07e4bc0d3d62/targets/%2F%2Ftest%2Fcpp%2Fperformance:writes_per_rpc_test@poller%3Dpoll/log
I think the original issue was the non-atomic load in
writes_per_rpc_test.cc but also making the change to use barriers
instead of relaxed atomics (probably unimportant but I'll prefer safety
in the absence of otherwise comments).
Could probably also use std::atomic but feeling a bit lazy to make the
changes throughout.
<!--
If you know who should review your pull request, please assign it to
that
person, otherwise the pull request would get assigned randomly.
If your pull request is for a specific language, please add the
appropriate
lang label.
-->
<!--
If you know who should review your pull request, please assign it to
that
person, otherwise the pull request would get assigned randomly.
If your pull request is for a specific language, please add the
appropriate
lang label.
-->
Update to
5501a1a255
<!--
If you know who should review your pull request, please assign it to
that
person, otherwise the pull request would get assigned randomly.
If your pull request is for a specific language, please add the
appropriate
lang label.
-->
This PR adds the view `grpc.io/client/api_latency` for GCP Observability
which aims to collect the end-to-end time taken by a call.
Changes made to support this -
1) A global interceptor factory registration is created for stats
plugins.
2) OpenCensus plugin now provides a new interceptor that's responsible
for collecting the new latency.
3) Gcp Observability registers this plugin.
4) A new OpenCensus measurement and view is created for api latency.
Note that this is internal as of now, since it's not clear if it should
be exposed as public experimental API. Leaving that decision for the
future.
<!--
If you know who should review your pull request, please assign it to
that
person, otherwise the pull request would get assigned randomly.
If your pull request is for a specific language, please add the
appropriate
lang label.
-->
This PR adds annotations to client attempt spans and server spans on
messages of the form -
* `Send message: 1026 bytes`
* `Send compressed message: 31 bytes` (if message was compressed)
* `Received message: 31 bytes`
* `Received decompressed message: 1026 bytes` (if message needed to be
decompressed)
Note that the compressed and decompressed annotations are not present if
compression/decompression was not performed.
<!--
If you know who should review your pull request, please assign it to
that
person, otherwise the pull request would get assigned randomly.
If your pull request is for a specific language, please add the
appropriate
lang label.
-->
<!--
If you know who should review your pull request, please assign it to
that
person, otherwise the pull request would get assigned randomly.
If your pull request is for a specific language, please add the
appropriate
lang label.
-->
---------
Co-authored-by: ctiller <ctiller@users.noreply.github.com>
Reverts grpc/grpc#32637
Apologies for not catching this before submission.
I continue to think that IsWorkerThread as an EventEngine member
function is a poor API choice: it implies that the thread is a thread
for the referenced event engine, and that's not what we're returning,
and also rarely useful.
If we want this concept in the API it should be a static function with a
way of registering a thread into the set. I however firmly believe we'll
do better long term if we abandon the concept of being able to identify
from outside who created a thread and instead solve problems in other
ways.
Initial PR to establish a bazel dependency on
https://github.com/google/fuzztest, with which I'm planning on basing a
hardening program.
Casting a relatively wide net with reviewers: I'm genuinely interested
in feedback building up the docs, and general ergonomics of this change.
I've located relevant files in the `fuzztest/...` directory. The tests
only build with the `--config fuzztest` bazel argument for now (because
of needing C++17), so locating them separately keeps `bazel test
test/...` working as it does today. In a few years time, when we adopt
C++17, we'll be able to rationalize the test directories a little bit.
We'll need to add some kokoro jobs (maybe with this PR?) to execute the
relevant tests.
<!--
If you know who should review your pull request, please assign it to
that
person, otherwise the pull request would get assigned randomly.
If your pull request is for a specific language, please add the
appropriate
lang label.
-->
---------
Co-authored-by: ctiller <ctiller@users.noreply.github.com>
This was missed from https://github.com/grpc/grpc/pull/32596, resulting
in k8s teardown exiting early with:
```
kubernetes.client.exceptions.ApiException: (404)
Reason: Not Found
HTTP response headers: OMITTED
HTTP response body: {"kind":"Status","apiVersion":"v1","metadata":{},"status":"Failure","message":"deployments.apps \"psm-grpc-client\" not found","reason":"NotFound","details":{"name":"psm-grpc-client","group":"apps","kind":"deployments"},"code":404}
```
With this change, teardown resumes with:
> `k8s_base_runner.py:282] Deployment psm-grpc-client deletion failed:
Kubernetes API returned 404 Not Found: deployments.apps
"psm-grpc-client" not found`
- construct string for tracing in chttp2_transport only if tracing is
enabled
- use equivalent absl::StrCat/StrAppend instead of
absl::StrFormat/StrJoin for formatting showing up on cpu profiles
Fix incompatibilities identified when running adhoc runs on the new
custom win2019 image.
After merging this, it should be possible to switch to the new image
without breaking any tests.
- for most fixes I added a comment that explains why they're necessary.
- the new image won't have VS2015 installed, so I'm switching the protoc
artifact build to VS2017
This PR will need to be backported to older release branches to ensure
the windows tests continue working on those branches as well (IMHO I
haven't made any changes that would be difficult to backport and I tried
to keeps the diff as small as possible to avoid issues when
backporting).
After we switch to the new image (and all the windows tests are green),
we can incrementally move the builds that are still using VS2017 to
VS2019.
This is a big rewrite of global config.
It does a few things, all somewhat intertwined:
1. centralize the list of configuration we have to a yaml file that can
be parsed, and code generated from it
2. add an initialization and a reset stage so that config vars can be
centrally accessed very quickly without the need for caching them
3. makes the syntax more C++ like (less macros!)
4. (optionally) adds absl flags to the OSS build
This first round of changes is intended to keep the system where it is
without major changes. We pick up absl flags to match internal code and
remove one point of deviation - but importantly continue to read from
the environment variables. In doing so we don't force absl flags on our
customers - it's possible to configure grpc without the flags - but
instead allow users that do use absl flags to configure grpc using that
mechanism. Importantly this lets internal customers configure grpc the
same everywhere.
Future changes along this path will be two-fold:
1. Move documentation generation into the code generation step, so that
within the source of truth yaml file we can find all documentation and
data about a configuration knob - eliminating the chance of forgetting
to document something in all the right places.
2. Provide fuzzing over configurations. Currently most config variables
get stashed in static constants across the codebase. To fuzz over these
we'd need a way to reset those cached values between fuzzing rounds,
something that is terrifically difficult right now, but with these
changes should simply be a reset on `ConfigVars`.
<!--
If you know who should review your pull request, please assign it to
that
person, otherwise the pull request would get assigned randomly.
If your pull request is for a specific language, please add the
appropriate
lang label.
-->
---------
Co-authored-by: ctiller <ctiller@users.noreply.github.com>
I'm opening this PR to (hopefully!) stimulate a discussion. In brief,
I'd like to amend the gRPC-Web protocol docs to encourage
implementations to follow HTTP semantics and compare HTTP field names
case-insensitively.
The gRPC-Web specification is a nicely-designed way for proxies to
expose standard HTTP/2 gRPC servers to clients using less
tightly-controlled HTTP stacks, such as web browsers. To serve that
goal, it seems valuable to have the gRPC-Web specification follow [RFC
9110 (HTTP
Semantics)](https://www.rfc-editor.org/rfc/rfc9110.html#name-field-names).
Like previous RFCs, 9110 specifies that "field names are
case-insensitive." However, the current gRPC-Web specification requires
that servers and proxies "use lower-case header/trailer names" on the
wire. In principle, mandating casing on the wire is normal for HTTP/2
and fine (if unusual) for HTTP/1.1; however, it encourages
implementations to violate HTTP semantics and require lower-case names
when _reading_ headers and trailers.
I'd like to loosen the gRPC-Web specification to permit any casing on
the wire for HTTP/1.1. I'd also like to emphasize that gRPC-Web
implementations ought to follow standard HTTP semantics when _reading_
fields and compare names case-insensitively. Implementations that can't
treat names case-insensitively without breaking backward compatibility
should instead normalize field names to lowercase. Among the
Google-maintained gRPC implementations, at least `grpc-go` and
`grpc-java` already compare names case-insensitively (even though
they're HTTP/2-only). `grpc-dart` does the opposite and compares names
case-sensitively. `grpc-web` is sometimes case-insensitive (when reading
`grpc-status` and `grpc-message` from trailers-only responses) and
sometimes case-sensitive (when hand-parsing a block of length-prefixed
trailers).
The proposed amendment does not affect the correctness of Envoy (which
may continue to use lower-case field names). It partially affects
`grpc-web`, which would require a small patch to always normalize names.
(Both patched and unpatched versions of `grpc-web` would work with
Envoy.) `grpc-dart` would need to either begin treating field names
case-insensitively or normalize names, depending on what's possible in
Dart without breaking backward compatibility.
Relates to https://github.com/improbable-eng/grpc-web/issues/228,
https://github.com/bufbuild/connect-go/issues/453/, and
https://github.com/grpc/grpc-dart/issues/594.
Looking for something else I made some test additions, code tweaks to
make `Poll<>` better.
<!--
If you know who should review your pull request, please assign it to
that
person, otherwise the pull request would get assigned randomly.
If your pull request is for a specific language, please add the
appropriate
lang label.
-->
Prior to this change, we invoke aycnio interceptors by converting them
to `iterator` first, which means we can only call them once before
they're exhausted.
This PR changes the implementation to use `list`, thus the interceptors
can be called multiple times.
This file changes every time the set of core files changes. This PR
leaves the `lang/python` label off of these PRs and therefore leaves
them out of our attention set.
We currently take named metrics recorded per-request only. Instead we
should merge field-wise.
<!--
If you know who should review your pull request, please assign it to
that
person, otherwise the pull request would get assigned randomly.
If your pull request is for a specific language, please add the
appropriate
lang label.
-->
1. Make channel creation lazy. This allows test cases to update the
configuration before the connection is made.
2. Pass load reports tracker when creating the policy. This way other
test cases do not see any changes to ChannelArguments.
Using grpc_core::CoreConfiguration::RunWithSpecialConfiguration was
considered but did not work as it removes other builders setup prior to
starting the test cases.
<!--
If you know who should review your pull request, please assign it to
that
person, otherwise the pull request would get assigned randomly.
If your pull request is for a specific language, please add the
appropriate
lang label.
-->
---------
Co-authored-by: ctiller <ctiller@users.noreply.github.com>
This change mostly aims to get OpenCensus to use the new
ServerCallTracer interface. Note that the interfaces nor the code are in
their final states. There are a bunch of moving pieces, but I thought
this might be a nice mid-step to check-in and make sure that our
internal traces can also work with these changes.
Overall changes -
1) call_tracer.h shows what the hierarchy of new call tracer interfaces
looks like. Open to renaming suggestions.
2) Moved most of the common interface between `CallAttemptTracer` and
`ServerCallTracer` into a common `CallTracerInterface`. We should be
able to eventually move `RecordReceivedTrailingMetadata` and `RecordEnd`
as well to these common interfaces, but it requires some additional
work.
3) The compression filter is now responsible for recording the recv and
send messages for both the subchannel call and the server, and adds in
ability to record compressed and decompressed messages as well.
4) The OpenCensus server filter now uses the new `ServerCallTracer`
interface, and so doesn't need to be a filter anymore.
5) A new ServerCallTracerFilter was added. Ideally, we should be able to
move it to the current connected filter, but it is in a bit of an
interesting state right now, so I would prefer making those changes in a
separate PR with Craig's eyes on it.
6) A new context element `GRPC_CONTEXT_CALL_TRACER_ANNOTATION_INTERFACE`
was created that replaces the old `GRPC_CONTEXT_CALL_TRACER`, and the
new `GRPC_CONTEXT_CALL_TRACER` is mainly to pass the `CallAttemptTracer`
down the stack. This should go away in the new promise-based world.
<!--
If you know who should review your pull request, please assign it to
that
person, otherwise the pull request would get assigned randomly.
If your pull request is for a specific language, please add the
appropriate
lang label.
-->
<!-- Reviewable:start -->
- - -
This change is [<img src="https://reviewable.io/review_button.svg"
height="34" align="absmiddle"
alt="Reviewable"/>](https://reviewable.io/reviews/grpc/grpc/32618)
<!-- Reviewable:end -->
<!--
If you know who should review your pull request, please assign it to
that
person, otherwise the pull request would get assigned randomly.
If your pull request is for a specific language, please add the
appropriate
lang label.
-->
- Increase kubernetes library default for urlib3 retries to 10
- Add custom retry logic to all API calls made by framework.k8s
Custom retry logic handles various errors we're experienced over
two years, and based on ~140 failure reports:
1. Errors returned by the k8s API server itself:
- 401 Unauthorized
- 409 Conflict
- 429 Too Many Requests
- 500 Internal Server Error
2. Connection errors that might indicate k8s API server is temporarily
unavailable (such as a restart, upgrade, etc):
- All `NewConnectionError`s, f.e. "Connection timed out",
"Connection refused"
- All "connection aborted" `ProtocolError`s, f.e. "Remote end
closed connection without response", "Connection reset by peer"
ref b/178378578, b/258546394