<!--
If you know who should review your pull request, please assign it to
that
person, otherwise the pull request would get assigned randomly.
If your pull request is for a specific language, please add the
appropriate
lang label.
-->
---------
Co-authored-by: ctiller <ctiller@users.noreply.github.com>
Initial PR to establish a bazel dependency on
https://github.com/google/fuzztest, with which I'm planning on basing a
hardening program.
Casting a relatively wide net with reviewers: I'm genuinely interested
in feedback building up the docs, and general ergonomics of this change.
I've located relevant files in the `fuzztest/...` directory. The tests
only build with the `--config fuzztest` bazel argument for now (because
of needing C++17), so locating them separately keeps `bazel test
test/...` working as it does today. In a few years time, when we adopt
C++17, we'll be able to rationalize the test directories a little bit.
We'll need to add some kokoro jobs (maybe with this PR?) to execute the
relevant tests.
<!--
If you know who should review your pull request, please assign it to
that
person, otherwise the pull request would get assigned randomly.
If your pull request is for a specific language, please add the
appropriate
lang label.
-->
---------
Co-authored-by: ctiller <ctiller@users.noreply.github.com>
This was missed from https://github.com/grpc/grpc/pull/32596, resulting
in k8s teardown exiting early with:
```
kubernetes.client.exceptions.ApiException: (404)
Reason: Not Found
HTTP response headers: OMITTED
HTTP response body: {"kind":"Status","apiVersion":"v1","metadata":{},"status":"Failure","message":"deployments.apps \"psm-grpc-client\" not found","reason":"NotFound","details":{"name":"psm-grpc-client","group":"apps","kind":"deployments"},"code":404}
```
With this change, teardown resumes with:
> `k8s_base_runner.py:282] Deployment psm-grpc-client deletion failed:
Kubernetes API returned 404 Not Found: deployments.apps
"psm-grpc-client" not found`
Fix incompatibilities identified when running adhoc runs on the new
custom win2019 image.
After merging this, it should be possible to switch to the new image
without breaking any tests.
- for most fixes I added a comment that explains why they're necessary.
- the new image won't have VS2015 installed, so I'm switching the protoc
artifact build to VS2017
This PR will need to be backported to older release branches to ensure
the windows tests continue working on those branches as well (IMHO I
haven't made any changes that would be difficult to backport and I tried
to keeps the diff as small as possible to avoid issues when
backporting).
After we switch to the new image (and all the windows tests are green),
we can incrementally move the builds that are still using VS2017 to
VS2019.
This is a big rewrite of global config.
It does a few things, all somewhat intertwined:
1. centralize the list of configuration we have to a yaml file that can
be parsed, and code generated from it
2. add an initialization and a reset stage so that config vars can be
centrally accessed very quickly without the need for caching them
3. makes the syntax more C++ like (less macros!)
4. (optionally) adds absl flags to the OSS build
This first round of changes is intended to keep the system where it is
without major changes. We pick up absl flags to match internal code and
remove one point of deviation - but importantly continue to read from
the environment variables. In doing so we don't force absl flags on our
customers - it's possible to configure grpc without the flags - but
instead allow users that do use absl flags to configure grpc using that
mechanism. Importantly this lets internal customers configure grpc the
same everywhere.
Future changes along this path will be two-fold:
1. Move documentation generation into the code generation step, so that
within the source of truth yaml file we can find all documentation and
data about a configuration knob - eliminating the chance of forgetting
to document something in all the right places.
2. Provide fuzzing over configurations. Currently most config variables
get stashed in static constants across the codebase. To fuzz over these
we'd need a way to reset those cached values between fuzzing rounds,
something that is terrifically difficult right now, but with these
changes should simply be a reset on `ConfigVars`.
<!--
If you know who should review your pull request, please assign it to
that
person, otherwise the pull request would get assigned randomly.
If your pull request is for a specific language, please add the
appropriate
lang label.
-->
---------
Co-authored-by: ctiller <ctiller@users.noreply.github.com>
<!--
If you know who should review your pull request, please assign it to
that
person, otherwise the pull request would get assigned randomly.
If your pull request is for a specific language, please add the
appropriate
lang label.
-->
---------
Co-authored-by: ctiller <ctiller@users.noreply.github.com>
This change mostly aims to get OpenCensus to use the new
ServerCallTracer interface. Note that the interfaces nor the code are in
their final states. There are a bunch of moving pieces, but I thought
this might be a nice mid-step to check-in and make sure that our
internal traces can also work with these changes.
Overall changes -
1) call_tracer.h shows what the hierarchy of new call tracer interfaces
looks like. Open to renaming suggestions.
2) Moved most of the common interface between `CallAttemptTracer` and
`ServerCallTracer` into a common `CallTracerInterface`. We should be
able to eventually move `RecordReceivedTrailingMetadata` and `RecordEnd`
as well to these common interfaces, but it requires some additional
work.
3) The compression filter is now responsible for recording the recv and
send messages for both the subchannel call and the server, and adds in
ability to record compressed and decompressed messages as well.
4) The OpenCensus server filter now uses the new `ServerCallTracer`
interface, and so doesn't need to be a filter anymore.
5) A new ServerCallTracerFilter was added. Ideally, we should be able to
move it to the current connected filter, but it is in a bit of an
interesting state right now, so I would prefer making those changes in a
separate PR with Craig's eyes on it.
6) A new context element `GRPC_CONTEXT_CALL_TRACER_ANNOTATION_INTERFACE`
was created that replaces the old `GRPC_CONTEXT_CALL_TRACER`, and the
new `GRPC_CONTEXT_CALL_TRACER` is mainly to pass the `CallAttemptTracer`
down the stack. This should go away in the new promise-based world.
<!--
If you know who should review your pull request, please assign it to
that
person, otherwise the pull request would get assigned randomly.
If your pull request is for a specific language, please add the
appropriate
lang label.
-->
<!-- Reviewable:start -->
- - -
This change is [<img src="https://reviewable.io/review_button.svg"
height="34" align="absmiddle"
alt="Reviewable"/>](https://reviewable.io/reviews/grpc/grpc/32618)
<!-- Reviewable:end -->
- Increase kubernetes library default for urlib3 retries to 10
- Add custom retry logic to all API calls made by framework.k8s
Custom retry logic handles various errors we're experienced over
two years, and based on ~140 failure reports:
1. Errors returned by the k8s API server itself:
- 401 Unauthorized
- 409 Conflict
- 429 Too Many Requests
- 500 Internal Server Error
2. Connection errors that might indicate k8s API server is temporarily
unavailable (such as a restart, upgrade, etc):
- All `NewConnectionError`s, f.e. "Connection timed out",
"Connection refused"
- All "connection aborted" `ProtocolError`s, f.e. "Remote end
closed connection without response", "Connection reset by peer"
ref b/178378578, b/258546394
`bazel query deps(//src/proto/...)` seems unnecessary (regenerated
projects are identical) and causes trouble with protobuf 22.x (since it
basically breaks `tools/buildgen/generate_projects.sh` run and that
makes upgrade experiments painful).
PSM Interop: Local dev various improvements
- Cleanup resources on ctrl+c
- Add startup probes to address the issue with port forwarding starting
before the workload listens on a port
- Remove misleading restartPolicy: it's silently ignored by k8s
- Extra debug message with port-forwarding command
<!--
If you know who should review your pull request, please assign it to
that
person, otherwise the pull request would get assigned randomly.
If your pull request is for a specific language, please add the
appropriate
lang label.
-->
- sort source files to ensure stable ordering
- generate one source file per line
together this should produce diffs that are much more readable by humans
when sources get added/removed to/from protobuf (and
make_grpcio_tools.py is used to regenerate).
First step in the modernization of our RBE stack (see
go/rbe-tech-debt-notes).
- Get rid of the deprecated rbe_autoconfig and start using
[rbe_configs_gen](https://github.com/bazelbuild/bazel-toolchains#rbe_configs_gen---cli-tool-to-generate-configs)
+ check in the generated toolchain configs.
- Switch from marketplace.gcr.io/google/rbe-ubuntu16-04 to
marketplace.gcr.io/google/rbe-ubuntu18-04 (this image is still not owned
by us, but at least it's newer and demonstrates how a switch to a newer
docker image is done).
- provide script for generating the linux RBE toolchain configs.
- cleanup RBE configuration in the bazelrc files used for remote build
PR #32215 added the verified root cert subject to the lower level
`tsi_peer`. This PR is a companion to that and completes the feature by
bubbling the information up to the `TsiCustomVerificationCheckRequest`
which is part of the user facing API for implementing custom
verification callbacks.
<!--
If you know who should review your pull request, please assign it to
that
person, otherwise the pull request would get assigned randomly.
If your pull request is for a specific language, please add the
appropriate
lang label.
-->
There are potentially surprising deployment bugs that can cause `EMFILE`
to be hit. For example, file descriptor limits can be easily reached if
- the round robin LB policy is used
- the load balancer hands out an assignment with a lot of backends
- using debian's default 1024 file descriptor limit.
To make such problems more apparent, we can pay special attention to
this error and log ERROR when it happens.
Related: b/265199104
Third try for #32466.
This adds an interop client / server for GCP Observability integration
testing.
Everything is new here with no refactor. Plan is to get this in first
before trying to refactor out the flags.
This test has been occasionally failing on CI with "Bus Error" crashes
while requiring the grpc shared library.
These crashes have been unreproducible locally. Let's continue debugging
(b/266212253) but skip this on CI.
Upgrade boringssl to the latest "master-with-bazel"
- use the `'USE_HEADERMAP' => 'NO'` fix for ObjC
- update the key for asm optimizations on mac/apple in python's setup.py
This PR depends on monterey fixes here:
https://github.com/grpc/grpc/pull/32493 and the boringssl's build
simplification
https://boringssl-review.googlesource.com/c/boringssl/+/56465.
---------
Co-authored-by: Hannah Shi <hannahshisfb@gmail.com>
Make remaining objC jobs compatible with kokoro monterey workers and
prepare for boringssl upgrade.
The changes here are taken from https://github.com/grpc/grpc/pull/32357,
but they should be merged in a separate PR
(we need the changes to be able to upgrade to monterey anyway and
there's no reason to make the boringssl upgrade PR more complicated by
bundling more fixes into it).
I've checked that the grpc_basictests_objc_examples and
grpc_ios_binary_size are green if switched to monterey.
Unfortunately it's hard to make grpc_basictests_objc_examples pass on
both monterey and mojave, so I suggest merging this PR at the same time
as CL to upgrade the kokoro jobs to monterey.
- that way both PR and continuous runs will remain green
- older branches would need a backport anyway
---------
Co-authored-by: Hannah Shi <hannahshisfb@gmail.com>
This filter was originally written only for the C++ wrapped layer, but
we have plans to use this for Python (and maybe other wrapped languages
too in the future.)
<!--
If you know who should review your pull request, please assign it to
that
person, otherwise the pull request would get assigned randomly.
If your pull request is for a specific language, please add the
appropriate
lang label.
-->
Looks like this was accidentally dropped from our build files in
https://github.com/grpc/grpc/pull/21929, which means that this test
hasn't actually been built or run in almost 3 years. Unsurprisingly
after all that time, I had to make some changes to the test to get it to
actually build.
With the `--copt="-std=c++14"` setting in the bazel.rc file as it is
today, MSVC builds have complained for every cc file:
```
cl : Command line warning D9002 : ignoring unknown option '-std=c++14'
```
This adds thousands of lines of noise to Windows builds, and hides
useful warnings. Using the `/std:c++14` flag on MSVC (and clang-cl) gets
us the desired result.
Internal Windows builds will catch issues that we cannot yet catch in
OSS. This will be mostly remedied once we have clang-cl in our CI (See a
rough roadmap in https://github.com/grpc/grpc/pull/32448). For now, this
PR identifies folders where most Windows-specific code is developed, and
requires cherrypicks for PRs that touch anything inside those folders.
This PR also refactors gpr and gprpp source files to better isolate all
platform-specific code ~the Windows-only code~. ~I will reorganize the
other platform-specific files using this structure if there are no
objections.~
This subset of folders covers about half of the `#ifdef GPR_WINDOWS`
usages in gRPC, but nearly all of the actively-developed Windows code
locations.
This version broke backward compatibility in `plugin_pb2.py`, which is
presumably a relatively minor regression, since we have not yet heard
any complaints about it. This PR:
- Excludes `4.22.0` from installation
- _Includes_ protobuf pre-releases into testing so this can be caught
more quickly in the future.
When bad prereleases are caught, we can exclude them from testing in a
similar manner to this PR. We may eventually want to invest into a
system where we can define these bad versions centrally.
PHP7 build is failing, removing from CI while investigating the failure.
<!--
If you know who should review your pull request, please assign it to
that
person, otherwise the pull request would get assigned randomly.
If your pull request is for a specific language, please add the
appropriate
lang label.
-->
This code is not plumbed through yet, but it provides the core
infrastructure needed to detect the proper GCP environment resources
needed to set up the labels/attributes/resources for stats, tracing and
logging.
Details on how the various environment resources are setup has been
derived by looking at java's cloud logging library and OpenTelemetry's
future plans. (Could be better explained in an offline review since some
links are internal).
Requesting @veblush for a full review and @markdroth for a structural
review.
<!--
If you know who should review your pull request, please assign it to
that
person, otherwise the pull request would get assigned randomly.
If your pull request is for a specific language, please add the
appropriate
lang label.
-->