Fixes the issue introduced in https://github.com/grpc/grpc/pull/33104,
where stopping the current run didn't reset `self.time_start_requested`,
`self.time_start_completed`, `self.time_start_stopped`. Because of this,
the subsetting test (the only one [redeploying the client
app](10001d16a9/tools/run_tests/xds_k8s_test_driver/tests/subsetting_test.py (L73C1-L74)))
started failing with:
```py
Traceback (most recent call last):
File "xds_k8s_test_driver/tests/subsetting_test.py", line 76, in test_subsetting_basic
test_client: _XdsTestClient = self.startTestClient(
File "xds_k8s_test_driver/framework/xds_k8s_testcase.py", line 615, in startTestClient
test_client = self.client_runner.run(server_target=test_server.xds_uri,
File "xds_k8s_test_driver/framework/test_app/runners/k8s/k8s_xds_client_runner.py", line 110, in run
super().run()
File "xds_k8s_test_driver/framework/test_app/runners/k8s/k8s_base_runner.py", line 112, in run
raise RuntimeError(
RuntimeError: Deployment psm-grpc-client: has already been started at 2023-05-27T13:47:15.262461
```
This PR:
1. Instead of relying on the `time_start_requested`,
`time_start_stopped` to produce GCP links, tracks the history run of
each deployment. This fixes the issue described above, and adds support
for listing all past runs executed by a k8s runner.
2. Minor: remove the unnecessary call to `test_client.cleanup()` when
there's no past deployment runs (e.g. at the first iteration of `for i
in range(_NUM_CLIENTS):`)
This is a hack to get around an issue on Apple devices caused by the
PosixEventEngine's `poll` poller not supporting fork. This PR disables
the EventEngine poller entirely in Python builds. It will therefore
prevent the release of the EventEngine generally, and prevent any
testing of EventEngine integration with gRPC Python, until the `poll`
poller is fixed.
Change was created by the release automation script. See go/grpc-release
<!--
If you know who should review your pull request, please assign it to
that
person, otherwise the pull request would get assigned randomly.
If your pull request is for a specific language, please add the
appropriate
lang label.
-->
---------
Co-authored-by: Yash Tibrewal <yashkt@google.com>
Co-authored-by: Esun Kim <veblush@google.com>
Co-authored-by: Mark D. Roth <roth@google.com>
Co-authored-by: Yousuk Seung <ysseung@google.com>
Co-authored-by: Craig Tiller <ctiller@google.com>
Co-authored-by: Richard Belleville <rbellevi@google.com>
Co-authored-by: gnossen <gnossen@users.noreply.github.com>
<!--
If you know who should review your pull request, please assign it to
that
person, otherwise the pull request would get assigned randomly.
If your pull request is for a specific language, please add the
appropriate
lang label.
-->
This is another attempt to add support for vsock in grpc since previous
PRs(#24551, #21745) all closed without merging.
The VSOCK address family facilitates communication between
virtual machines and the host they are running on.
This patch will introduce new scheme: [vsock:cid:port] to
support VSOCK address family.
Fixes#32738.
---------
Signed-off-by: Yadong Qi <yadong.qi@intel.com>
Co-authored-by: AJ Heller <hork@google.com>
Co-authored-by: YadongQi <YadongQi@users.noreply.github.com>
It's completely legitimate to have a zero in a filename/include guard.
<!--
If you know who should review your pull request, please assign it to
that
person, otherwise the pull request would get assigned randomly.
If your pull request is for a specific language, please add the
appropriate
lang label.
-->
Put enough internal delays into this test and it hits deadline
exceeded... extend the deadline to cover that.
(this is likely to become a common edit over the next few weeks...)
Added tests involve:
1. Checking the # of logger invocations with multiple RBACs in the
chain.
2. Verifying content in audit context with action and audit condition
permutations.
3. Confirm custom logger and built-in logger configurations are working.
4. Confirm the feature is protected by the environment variable.
---------
Co-authored-by: rockspore <rockspore@users.noreply.github.com>
Allows usage on machines that don't support ipv4.
<!--
If you know who should review your pull request, please assign it to
that
person, otherwise the pull request would get assigned randomly.
If your pull request is for a specific language, please add the
appropriate
lang label.
-->
For the details of the issue, please see #33157.
Add several paddings in struct grpc_comletion_quete to avoid the false
sharing.
I see performance improvement with the QPS tests.
cpp_protobuf_async_streaming_qps_unconstrained_1cq_insecure: ~2.9%
cpp_protobuf_async_unary_qps_unconstrained_1cq_insecure: ~7%
<!--
If you know who should review your pull request, please assign it to
that
person, otherwise the pull request would get assigned randomly.
If your pull request is for a specific language, please add the
appropriate
lang label.
-->
<!--
If you know who should review your pull request, please assign it to
that
person, otherwise the pull request would get assigned randomly.
If your pull request is for a specific language, please add the
appropriate
lang label.
-->
~5% flake introduced earlier this week, where the pool was not scaling
when draining work @ shutdown. Maintaining a busy thread count at
shutdown allows the lifeguard to notice when all threads are blocked,
and spawn new threads if necessary.
<!--
If you know who should review your pull request, please assign it to
that
person, otherwise the pull request would get assigned randomly.
If your pull request is for a specific language, please add the
appropriate
lang label.
-->
---------
Co-authored-by: ctiller <ctiller@users.noreply.github.com>
<!--
If you know who should review your pull request, please assign it to
that
person, otherwise the pull request would get assigned randomly.
If your pull request is for a specific language, please add the
appropriate
lang label.
-->
---------
Co-authored-by: ctiller <ctiller@users.noreply.github.com>
We were previously trying to adjust the initial guess flow control
window *and* the adjustment methodology, which created some problems.
The latter is more important for us to get in, so don't change the
former for now (we can do that later).
<!--
If you know who should review your pull request, please assign it to
that
person, otherwise the pull request would get assigned randomly.
If your pull request is for a specific language, please add the
appropriate
lang label.
-->
recv_initial_metadata_ in `RetryFilter::CallAttempt` is deleted before
it's accessed in
`ClientChannel::FilterBasedLoadBalancedCall::RecvTrailingMetadataReady`
- instead of accessing it, pull out peer string and keep a ref
- switch to json_object_loader for config parsing
- use `absl::string_view` instead of `const char*` for cert provider
names
- change cert provider registry to use a map instead of a vector
- remove unused mesh_ca cert provider factory
Allow usage in production tasks
<!--
If you know who should review your pull request, please assign it to
that
person, otherwise the pull request would get assigned randomly.
If your pull request is for a specific language, please add the
appropriate
lang label.
-->
I generated a new client key and cert where a Spiffe ID is added as the
URI SAN. As such, we are able to test the audit log contains the
principal correctly.
Update: I switched to use the test logger to verify the log content and
removed stdout logger here because one the failure of [RBE Windows Debug
C/C++](https://source.cloud.google.com/results/invocations/c3187f41-bb1f-44b3-b2b1-23f38e47386d).
Update again: Refactored the test logger in a util such that the authz
engine test also uses the same logger. Subsequently, xDS e2e test will
also use it.
---------
Co-authored-by: rockspore <rockspore@users.noreply.github.com>
The PR does the following:
* Splits the single experiments.yaml file into two files:
experiments.yaml and rollouts.yaml.
* The experiments.yaml will now only include experiment definitions. The
default values of the experiments must now be specified in rollouts.yaml
* Removes the 'release' default value because it is not used.
* Adds an additional_constraints character string to ExperimentMetadata.
* Introduces a hook in src/core/lib/experiments/config.h to allow
registering arbitrary experiment constraint validation callbacks. These
callbacks would take an ExperimentMetadata object as input and return
the correct value to use for an experiment subject to additional
constraints.
Parties prefer to wakeup inline, however there are some mechanisms that
want an out-of-line wakeup (say due to a previously held mutex that may
be re-taken). To help those cases permit a guaranteed asynchronous
wakeup.
(needed now for resolver wakeups on the client call path)
We defaulted this on 5 months ago, and it seems to be working... let's
remove the experiment bit!
---------
Co-authored-by: ctiller <ctiller@users.noreply.github.com>
- Accept JSON null for any optional field.
- Do *not* accept JSON null for wrapper types (`absl::optional<>`,
`std::unique_ptr<>`, and `RefCountedPtr<>`) that are *not* marked as
optional fields.
As the [issue](https://github.com/grpc/grpc/issues/10136) documents, the
behavior of AsyncNotifyWhenDone is documented as:
"The comment on `AsyncNotifyWhenDone` states "Has to be called before
the rpc starts" but it seems that if the request tag is returned with
ok=false (i.e. because the CQ is shutting down) then the async done tag
is never received. Instead, I expect the async done tag to be received
regardless of whether or not an incoming call request was successfully
received."
The TODO item is marked closed as stale, and it seems unlikely this will
be resolved, without breaking
existing users whose code is written under the assumption that the tag
is not seen if the call never starts, so it may be time to documented
the idiosyncratic corner case and make it the expected behavior.
(This is a re-open PR for https://github.com/grpc/grpc/pull/32999, which
was closed accidentally due to the branch re-base and force-push)
Implement the frame serialization/deserialization method in chaotic-good
transport.
Previous comments from Craig:
- Since messages are not part of the framing system anymore, I think we
should remove ReceiveMessage (and therefore ReceivePadding) from this
type.
(instead we should add some helper functions to get the message lengths)
-- Resolved
- This approach will cause all frame manipulation code to know about
this serialization detail, rather than just the code that's serializing
it - I think it would be better to keep the type, flags separation (even
if we need to change the flags representation)
-- Done, changed back to type, flags separation.
<!--
If you know who should review your pull request, please assign it to
that
person, otherwise the pull request would get assigned randomly.
If your pull request is for a specific language, please add the
appropriate
lang label.
-->
I have not been able to reproduce the non-empty pool @ shutdown bug in
around 200k runs of various kinds. Now that experiments are marked flaky
by default, any similar failures should not block PR submission, and
this will give me good signal if the bugs reproduce more frequently in
the CI environment.
I have a fix in theory, but I don't think it should be necessary. If the
bug reproduces, I'll try the fix.
This brings the `ThreadPoolTest.ForkStressTest` down from ~4s to < 1s,
by reducing the lower bound on worker thread wait time between fork
checks when no work is available.
Also changed lifetime of lifeguard's `thread_running_`. I don't believe
this fixes the rare TSAN race we've been seeing, but I believe this
range is more correct.
Will be used to evaluate experiment effects on memory usage once they're
toggled on.
<!--
If you know who should review your pull request, please assign it to
that
person, otherwise the pull request would get assigned randomly.
If your pull request is for a specific language, please add the
appropriate
lang label.
-->
Make the change to improve branch prediction.
For the details, please see #33204.
With the change, the QPS increases by ~0.5% in
"cpp_protobuf_sync_streaming_qps_unconstrained_10mps_secure" test.
cpp_protobuf_sync_streaming_qps_unconstrained_10mps_secure: +0.515%
Most of these data structures need to scale a bit like per-cpu, but not
entirely. We can have more than one cpu hit the same instance in most
cases, and probably want to cap out before the hundreds of shards some
platforms have.
<!--
If you know who should review your pull request, please assign it to
that
person, otherwise the pull request would get assigned randomly.
If your pull request is for a specific language, please add the
appropriate
lang label.
-->
---------
Co-authored-by: ctiller <ctiller@users.noreply.github.com>
This reverts the changes made to ring_hash in #29872. The comment in the
picker code specifically says not to change these variables to an
unsigned type, but that's exactly what that PR did. I don't know if this
has actually been causing any problems, but given that we duplicated
this algorithm (and that comment) from elsewhere without doing a
detailed analysis of it, it seems prudent to stick with the types that
the original code suggested were important.
To avoid causing problems for ObjC, I have changed this such that
ring_hash is not built unless we are building with xDS support, which we
exclude on mobile. Currently, there is no way to use ring_hash without
xDS, although that might change in the future; if it does, we can deal
with any problems that arise at that point.