Chiebot-Mirror/grpc - grpc - Gitea: Git with a cup of tea

Commit Graph

Author	SHA1	Message	Date
Sergii Tkachenko	0d678a9551	[PSM Interop] URL Map graceful teardown (#33090 ) Similar to what we already do in other test suites: - Try cleaning up resources three times. - If unsuccessful, don't fail the test and just log the error. The cleanup script should be the one to deal with this. ref b/282081851	2 years ago
Doug Fawley	b346737290	[interop] enable Go ORCA tests (#33082 )	2 years ago
Mark D. Roth	8fdfb22848	[JSON] generalize handling of RefCountedPtr<> (#33048 ) Also remove a check in the weighted_target LB policy that I somehow missed in #32932.	2 years ago
Craig Tiller	4674f2ccf7	[fuzz] Turn core end2end tests into fuzzers (#33013 ) Add a new binary that runs all core end2end tests in fuzzing mode. In this mode FuzzingEventEngine is substituted for the default event engine. This means that time is simulated, as is IO. The FEE gets control of callback delays also. In our tests the `Step()` function becomes, instead of a single call to `completion_queue_next`, a series of calls to that function and `FuzzingEventEngine::Tick`, driving forward the event loop until progress can be made. PR guide: --- New binaries `core_end2end_test_fuzzer` - the new fuzzer itself `seed_end2end_corpus` - a tool that produces an interesting seed corpus Config changes for safe fuzzing The implementation tries to use the config fuzzing work we've previously deployed in api_fuzzer to fuzz across experiments. Since some experiments are far too experimental to be safe in such fuzzing (and this will always be the case): - a new flag is added to experiments to opt-out of this fuzzing - a new hook is added to the config system to allow variables to re-write their inputs before setting them during the fuzz Event manager/IO changes Changes are made to the event engine shims so that tcp_server_posix can run with a non-FD carrying EventEngine. These are in my mind a bit clunky, but they work and they're in code that we expect to delete in the medium term, so I think overall the approach is good. Changes to time A small tweak is made to fix a bug initializing time for fuzzers in time.cc - we were previously failing to initialize `g_process_epoch_cycles` Changes to `Crash` A version that prints to stdio is added so that we can reliably print a crash from the fuzzer. Changes to CqVerifier Hooks are added to allow the top level loop to hook the verification functions with a function that steps time between CQ polls. Changes to end2end fixtures State machinery moves from the fixture to the test infra, to keep the customizations for fuzzing or not in one place. This means that fixtures are now just client/server factories, which is overall nice. It did necessitate moving some bespoke machinery into h2_ssl_cert_test.cc - this file is beginning to be problematic in borrowing parts but not all of the e2e test machinery. Some future PR needs to solve this. A cq arg is added to the Make functions since the cq is now owned by the test and not the fixture. Changes to test registration `TEST_P` is replaced by `CORE_END2END_TEST` and our own test registry is used as a first depot for test information. The gtest version of these tests: queries that registry to manually register tests with gtest. This ultimately changes the name of our tests again (I think for the last time) - the new names are shorter and more readable, so I don't count this as a regression. The fuzzer version of these tests: constructs a database of fuzzable tests that it can consult to look up a particular suite/test/config combination specified by the fuzzer to fuzz against. This gives us a single fuzzer that can test all 3k-ish fuzzing ready tests and cross polinate configuration between them. Changes to test config The zero size registry stuff was causing some problems with the event engine feature macros, so instead I've removed those and used GTEST_SKIP in the problematic tests. I think that's the approach we move towards in the future. Which tests are included Configs that are compatible - those that do not do fd manipulation directly (these are incompatible with FuzzingEventEngine), and those that do not join threads on their shutdown path (as these are incompatible with our cq wait methodology). Each we can talk about in the future - fd manipulation would be a significant expansion of FuzzingEventEngine, and is probably not worth it, however many uses of background threads now should probably evolve to be EventEngine::Run calls in the future, and then would be trivially enabled in the fuzzers. Some tests currently fail in the fuzzing environment, a `SKIP_IF_FUZZING` macro is used for these few to disable them if in the fuzzing environment. We'll burn these down in the future. Changes to fuzzing_event_engine Changes are made to time: an exponential sweep forward is used now - this catches small time precision things early, but makes decade long timers (we have them) able to be used right now. In the future we'll just skip time forward to the next scheduled timer, but that approach doesn't yet work due to legacy timer system interactions. Changes to port assignment: we ensure that ports are legal numbers before assigning them via `grpc_pick_port_or_die`. A race condition between time checking and io is fixed. --------- Co-authored-by: ctiller <ctiller@users.noreply.github.com>	2 years ago
Hannah Shi	ad2a5dd355	[ObjC] Cf event engine client (#33034 ) Added `//:gpr_platform` to cf_engine_test to fix build_cleaner check in the previous merge. More details in https://github.com/grpc/grpc/pull/33027	2 years ago
Yingwei Fan	207f3ec865	[core] Add an assertion to catch the environment variable emptiness issue. (#32836 ) Co-authored-by: Yash Tibrewal <yashkt@google.com> Co-authored-by: Stanley Cheung <stanleycheung@google.com> Co-authored-by: AJ Heller <hork@google.com> Co-authored-by: Yijie Ma <yijiem.main@gmail.com> Co-authored-by: apolcyn <apolcyn@google.com> Co-authored-by: Jan Tattermusch <jtattermusch@google.com>	2 years ago
Sergii Tkachenko	c182e6b252	[PSM Interop] Allow `dev` TESTING_VERSION that doesn't override images (#33062 ) Resolve `TESTING_VERSION` to `dev-VERSION` when the job is initiated by a user, and not the CI. Override this behavior with setting `FORCE_TESTING_VERSION`. This solves the problem with the manual job runs executed against a WIP branch (f.e. a PR) overriding the tag of the CI-built image we use for daily testing. The `dev` and `dev-VERSION` "magic" values supported by the `--testing_version` flag: - `dev` and `dev-master` and treated as `master`: all `config.version_gte` checks resolve to `True`. - `dev-VERSION` is treated as `VERSION`: `dev-v1.55.x` is treated as simply `v1.55.x`. We do this so that when manually running jobs for old branches the feature skip check still works, and unsupported tests are skipped. This changes will take care of all langs/branches, no backports needed. ref b/256845629	2 years ago
Sergii Tkachenko	ac4b2233e2	[PSM Interop] Readme: also recommend enabling logging and monitoring (#33059 ) Since we use Logs Explorer.	2 years ago
Esun Kim	dc95133140	[Deps] Upgrade Protobuf v23 (#32914 ) Upgrading Protobuf and Upb to v23.0	2 years ago
Mark D. Roth	1432fe4e4c	[JSON] make API public but experimental (#32987 ) This makes the JSON API visible as part of the C-core API, but in the `experimental` namespace. It will be used as part of various experimental APIs that we will be introducing in the near future, such as the audit logging API.	2 years ago
Craig Tiller	f1bba530a5	[infra] Make tools/buildgen/generate_projects.py non-executable (#33042 ) This file does not contain a shebang, and whenever I try and run it it wedges my console into some weird state. There's a .sh file with the same name that should be run instead. Remove the executable bit of the thing we shouldn't run directly so we, like, don't. <!-- If you know who should review your pull request, please assign it to that person, otherwise the pull request would get assigned randomly. If your pull request is for a specific language, please add the appropriate lang label. -->	2 years ago
Sergii Tkachenko	448084c186	[PSM Interop] Improve error messages in the affinity test (#33031 ) Previously the error message didn't provide much context, example: ```py Traceback (most recent call last): File "/tmpfs/tmp/tmp.BqlenMyXyk/grpc/tools/run_tests/xds_k8s_test_driver/tests/affinity_test.py", line 127, in test_affinity self.assertLen( AssertionError: [] has length of 0, expected 1. ``` ref b/279990584.	2 years ago
AJ Heller	3fb738b9b1	[EventEngine] Implement work-stealing in the EventEngine ThreadPool (#32869 ) This PR implements a work-stealing thread pool for use inside EventEngine implementations. Because of historical risks here, I've guarded the new implementation behind an experiment flag: `GRPC_EXPERIMENTS=work_stealing`. Current default behavior is the original thread pool implementation. Benchmarks look very promising: ``` bazel test \ --test_timeout=300 \ --config=opt -c opt \ --test_output=streamed \ --test_arg='--benchmark_format=csv' \ --test_arg='--benchmark_min_time=0.15' \ --test_arg='--benchmark_filter=_FanOut' \ --test_arg='--benchmark_repetitions=15' \ --test_arg='--benchmark_report_aggregates_only=true' \ test/cpp/microbenchmarks:bm_thread_pool ``` 2023-05-04: `bm_thread_pool` benchmark results on my local machine (64 core ThreadRipper PRO 3995WX, 256GB memory), comparing this PR to master: ![image](https://user-images.githubusercontent.com/295906/236315252-35ed237e-7626-486c-acfa-71a36f783d22.png) 2023-05-04: `bm_thread_pool` benchmark results in the Linux RBE environment (unsure of machine configuration, likely small), comparing this PR to master. ![image](https://user-images.githubusercontent.com/295906/236317164-2c5acbeb-fdac-4737-9b2d-4df9c41cb825.png) --------- Co-authored-by: drfloob <drfloob@users.noreply.github.com>	2 years ago
Xuan Wang	7f7a524a9a	[PSM Interop] Add install kubectl authentication plugin to README (#33029 ) <!-- If you know who should review your pull request, please assign it to that person, otherwise the pull request would get assigned randomly. If your pull request is for a specific language, please add the appropriate lang label. --> --------- Co-authored-by: Sergii Tkachenko <hi@sergii.org>	2 years ago
Easwar Swaminathan	f68081767a	[Test] Add v1.55.0 release of grpc-go to interop matrix (#33022 ) I noticed that the [PR](https://github.com/grpc/grpc/pull/32683) to add v1.54.0 is still not merged. So, I added a line for that as well.	2 years ago
AJ Heller	ee0aaacbde	Revert "[ObjC] CF Stream Event Engine Client" (#33027 ) Reverts grpc/grpc#32924. This breaks the build again, unfortunately. From `test/core/event_engine/cf:cf_engine_test`: ``` error: module .../grpc/test/core/event_engine/cf:cf_engine_test does not depend on a module exporting 'grpc/support/port_platform.h' ``` @sampajano I recommend looking into CI tests to catch iOS problems before merging. We can enable EventEngine experiments in the CI generally once this PR lands, but this broken test is not one of those experiments. A normal build should have caught this. cc @HannahShiSFB	2 years ago
Hannah Shi	d0c1809840	[ObjC] CF Stream Event Engine Client (#32924 ) bazel build --config=macos --genrule_strategy=local --copt="-DGRPC_CFSTREAM=1" //test/cpp/end2end:cfstream_test succeeds Fixing failure described here: https://github.com/grpc/grpc/pull/32882#issuecomment-1512210309	2 years ago
Sergii Tkachenko	3f195380af	[PSM Interop] Bump the timeout for grpc_xds_k8s_lb_python job to 3h (#33019 ) The job run time was creeping to the 2h timeout. Let's bump it to 3h. Note that this is `master` branch, so it also includes the build time every time we commit to grpc/grpc. ref b/280784903	2 years ago
Craig Tiller	ad41fe96b6	[promises] Re-enable C++ end2end tests (with fixes) (#32837 ) Makes some awkward fixes to compression filter, call, connected channel to hold the semantics we have upheld now in tests. Once the fixes described here https://github.com/grpc/grpc/blob/master/src/core/lib/channel/connected_channel.cc#L636 are in this gets a lot less ad-hoc, but that's likely going to be post-landing promises client & server side. We specifically need special handling for server side cancellation in response to reads wrt the inproc transport - which doesn't track cancellation thoroughly enough itself. <!-- If you know who should review your pull request, please assign it to that person, otherwise the pull request would get assigned randomly. If your pull request is for a specific language, please add the appropriate lang label. --> --------- Co-authored-by: ctiller <ctiller@users.noreply.github.com>	2 years ago
Craig Tiller	65a2a895af	[chttp2] Fix some fuzzer found bugs. (#33005 ) <!-- If you know who should review your pull request, please assign it to that person, otherwise the pull request would get assigned randomly. If your pull request is for a specific language, please add the appropriate lang label. -->	2 years ago
Luwei Ge	abc82b9e19	[Audit Logging] Audit logging support in authorization engines. (#32995 ) 1. `GrpcAuthorizationEngine` creates the logger from the given config in its ctor. 2. `Evaluate()` invokes audit logging when needed. --------- Co-authored-by: rockspore <rockspore@users.noreply.github.com>	2 years ago
Craig Tiller	0982f82f47	[fuzzing] Add fuzztest config (#32676 ) <!-- If you know who should review your pull request, please assign it to that person, otherwise the pull request would get assigned randomly. If your pull request is for a specific language, please add the appropriate lang label. -->	2 years ago
Yousuk Seung	8b02295e58	[xDS] Accept cpu_utilization over 100% (#32954 ) <!-- If you know who should review your pull request, please assign it to that person, otherwise the pull request would get assigned randomly. If your pull request is for a specific language, please add the appropriate lang label. -->	2 years ago
Jan Tattermusch	30b3d5061a	[test-infra] Switch RBE linux build to a new custom image rbe_ubuntu2004. (#32748 ) - Add a new docker image "rbe_ubuntu2004" that is built in a way that's analogous to how our other testing docker images are built (this gives us control over what exactly is contained in the docker image and ability to fine-tune our RBE configuration) - Switch RBE on linux to the new image (which gives us ubuntu20.04-based builds) For some reason, RBE seems to have trouble pulling the docker image from Google Artifact Registry (GAR), which is where our public testing images normally live, so for now, I used a workaround and I upload a copy of the rbe_ubuntu2004 docker image to GCR as well, and that makes RBE works just fine (see comment in the `renerate_linux_rbe_configs.sh` script). More followup items (config cleanup, getting local sanitizer builds working etc.) are in go/grpc-rbe-tech-debt-2023	2 years ago
Sergii Tkachenko	1267000bbb	[PSM Interop] Minor fixes to the `bin/cleanup_cluster.sh` helper (#32953 )	2 years ago
Michael Jarrett	c2d589c949	[build] Add Bazel user-defined build setting for `grpc_no_rls`. (#32930 ) This can be used to disable RLS to decrease binary size, on platforms that don't disable it automatically.	2 years ago
Yash Tibrewal	0dbe8bd37d	[Kokoro] Increase arm64 test timeout (#32950 ) <!-- If you know who should review your pull request, please assign it to that person, otherwise the pull request would get assigned randomly. If your pull request is for a specific language, please add the appropriate lang label. -->	2 years ago
Sergii Tkachenko	c0ee9ff4d5	[PSM Interop] Various improvements to the helper scripts (#32745 ) - Fix broken `bin/run_channelz.py` helper - Create `bin/run_ping_pong.py` helper that runs the baseline (aka "ping_pong") test against preconfigured infra - Setup automatic port forwarding when running `bin/run_channelz.py` and `bin/run_ping_pong.py` - Create `bin/cleanup_cluster.sh` helper to wipe xds out resources based namespaces present on the cluster Note: this involves a small change to the non-helper code, but it's just moving a the part that makes XdsTestServer/XdsTestClient instance for a given pod.	2 years ago
Luwei Ge	dcfc5d6904	[Audit Logging] Logger and factory APIs in C-Core and C++. (#32750 ) Audit logging APIs for both built-in loggers and third-party logger implementations. C++ uses using decls referring to C-Core APIs. --------- Co-authored-by: rockspore <rockspore@users.noreply.github.com>	2 years ago
Esun Kim	5037f38d0d	[Test] Bump the timeout of at-head tests to 6hr (#32939 ) To fix the timeout of protobuf-at-head test, we may want to reduce the test-load of these tests but let's fix the breakage first.	2 years ago
Luwei Ge	2917804b9a	[Audit Logging] Xds Audit Logger Registry (#32828 ) Third-party loggers will be added in subsequent PRs once the logger factory APIs are available to validate the configs here. This registry is used in `xds_http_rbac_filter.cc` to generate service config json.	2 years ago
Esun Kim	a8b787fae3	[Test] Used python 3.9 for python-alpine (#32931 ) Fix at-head tests (this is a missing piece of https://github.com/grpc/grpc/pull/32905) with the following error; ``` /var/local/git/grpc/tools/run_tests/helper_scripts/build_python.sh: line 126: python3.8: command not found ```	2 years ago
Yash Tibrewal	dc075539e7	[Release] Bump version to 1.56.0-dev (on master branch) (#32918 ) Change was created by the release automation script. See go/grpc-release	2 years ago
Sergii Tkachenko	2fe7b5b881	[PSM Interop] Temporary remedy for the issue with pod log dups (#32922 ) While a proper fix is on the way, this mitigates the number of duplicated container logs in the xds test server/client pod logs. The issue is that we only wait between stream restarts when an exception is caught, which isn't always the reason the stream gets broken. Another reason is the main container being shut down by k8s. In this situation, we essentially do ```py while True: try: restart_stream() read_all_logs_from_pod_start() except Exception: logger.warning('error') wait_seconds(1) ``` This PR makes it ```py while True: try: restart_stream() read_all_logs_from_pod_start() except Exception: logger.warning('error') finally: wait_seconds(5) ```	2 years ago
Esun Kim	3bd6c38650	[Infra] Use Bazel 6 and drop Bazel 4 (Part 2) (#32910 ) Oops I missed important changes from https://github.com/grpc/grpc/pull/32712. And it turned out that there are two problems that I couldn't fix at this point. - Windows Bazel RBE Linker Error: This may be caused by how new Bazel 6 invokes build tools chain but it's not clear. I put workaround to use Bazel 5 by using `OVERRIDE_BAZEL_VERSION=5.4.1` - Rule `rules_pods` to fetch CronetFramework from CocoaPod has incompatibility with sort of built-in apple toolchain. (https://github.com/bazel-xcode/PodToBUILD/issues/232): I couldn't find a workaround to fix this so I ended up disabling all tests depending this target.	2 years ago
Esun Kim	63ae99d36e	[Test] Fix git on Alpine (#32913 ) Fix `python_alpine` test failure with ``` fatal: detected dubious ownership in repository at '/var/local/jenkins/grpc' To add an exception for this directory, call: git config --global --add safe.directory /var/local/jenkins/grpc ```	2 years ago
Mark D. Roth	0c08ed77e0	[client channel] move health checking code out of subchannel and into LB policies (#32709 ) This paves the way for making pick_first the universal leaf policy (see #32692), which will be needed for the dualstack design. That change will require changing pick_first to see both the raw connectivity state and the health-checking connectivity state of a subchannel, so that we can enable health checking when pick_first is used underneath round_robin without actually changing the pick_first connectivity logic (currently, pick_first always disables health checking). To make it possible to do that, this PR moves the health checking code out of the subchannel and into a separate API using the same data-watcher mechanism that was added for ORCA OOB calls.	2 years ago
Esun Kim	c59fd2ed88	[Test] Upgrade Alpine Linux to 3.15 (#32905 ) Upgrade Alpine Linux version as 3.15 is the oldest supported version.	2 years ago
Yash Tibrewal	d299f5ecce	[Release] Bump core version to 32.0.0 for upcoming release (#32908 ) Change was created by the release automation script. See go/grpc-release	2 years ago
Larry Safran	4cb69f4658	[PSM Interop] Fix the issue with URL Map test suite not cleaning up failed test client (#32877 ) `tearDownClass` is not executed when `setUpClass` failed. In URL Map test suite, this leads to a test client that failed to start not being cleaned up. This PR change the URL Map test suite to register a custom `addClassCleanup` callback, instead of relying on the `tearDownClass`. Unlike `tearDownClass`, cleanup callbacks are executed when the `setUpClass` failed. ref b/276761453	2 years ago
Vignesh Babu	c515eba30b	[Transport] Update Chttp2 context list to include relative offset of traced RPCs within outgoing buffer (#32825 ) The PR also creates a separate BUILD target for: - chttp2 context list - iomgr buffer_list - iomgr internal errqueue This would allow the context list to be included as standalone dependencies for EventEngine implementations.	2 years ago
Esun Kim	b1c94e1dc5	[Infra] Fix make_grpcio_tools (#32904 ) When abseil changes, `make_grpcio_tools` needs to run to get an updated list of abseil files.	2 years ago
Craig Tiller	efa939ac1f	[cleanup] Remove public_headers_must_be_c89 test (#32898 ) We're starting to introduce C++ APIs to C-core, so this test is no longer relevant.	2 years ago
Craig Tiller	5cae7abd31	[transport] Move compression traits into a separate header. (#32895 ) We get a circular dependency problem otherwise if these are needed in `custom_metadata.h` (which they always are for that file to be useful).	2 years ago
Esun Kim	c523bdac1e	[C++] Added a cord support to gRPC protobuf serializer (#32617 ) As Protobuf is going to support Cord to reduce memory copy when [de]serializing Cord fields, gRPC is going to leverage it. This implementation is based on the internal one but it's slightly modified to use the public APIs of Cord. only	2 years ago
Jan Tattermusch	df5af05ce7	[test-infra] Sanity test job should only run sanity, not iwyu and clang-tidy (#32874 ) Followup for https://github.com/grpc/grpc/pull/31141. IWYU and clang-tidy have been "moved" to a separate kokoro job, but as it turns out the sanity job still runs all of `[sanity, clang-tidy, iwyu]`, which makes the grpc_sanity jobs very slow. The issue is that grpc_sanity selects tasks that have "sanity" label on them and as of now, clang-tidy and iwyu still do. It can be verified by: ``` tools/run_tests/run_tests_matrix.py -f sanity --dry_run Will run these tests: run_tests_sanity_linux_dbg_native: "python3 tools/run_tests/run_tests.py --use_docker -t -j 2 -x run_tests/sanity_linux_dbg_native/sponge_log.xml --report_suite_name sanity_linux_dbg_native -l sanity -c dbg --iomgr_platform native --report_multi_target" run_tests_clang-tidy_linux_dbg_native: "python3 tools/run_tests/run_tests.py --use_docker -t -j 2 -x run_tests/clang-tidy_linux_dbg_native/sponge_log.xml --report_suite_name clang-tidy_linux_dbg_native -l clang-tidy -c dbg --iomgr_platform native --report_multi_target" run_tests_iwyu_linux_dbg_native: "python3 tools/run_tests/run_tests.py --use_docker -t -j 2 -x run_tests/iwyu_linux_dbg_native/sponge_log.xml --report_suite_name iwyu_linux_dbg_native -l iwyu -c dbg --iomgr_platform native --report_multi_target" ``` This PR should fix this (be removing the umbrella "sanity" label from clang-tidy and iwyu)	2 years ago
Yash Tibrewal	bdae467be8	[Release] Add v1.54.0 to interop matrix (#32862 ) validation job - https://fusion2.corp.google.com/ci/kokoro/prod:grpc%2Fcore%2Fexperimental%2Flinux%2Fgrpc_interop_matrix_adhoc/activity/730b301a-e048-486a-98c2-6a1fe7f9b276/summary	2 years ago
Craig Tiller	5ac894a04f	Revert "[ObjC] CF EventEngine client" (#32882 ) Reverts grpc/grpc#32077	2 years ago
Hannah Shi	e9a592a00f	[ObjC] CF EventEngine client (#32077 ) <!-- If you know who should review your pull request, please assign it to that person, otherwise the pull request would get assigned randomly. If your pull request is for a specific language, please add the appropriate lang label. --> @sampajano	2 years ago
Sergii Tkachenko	f2a7f6d51b	[PSM Interop] Increase k8s startup probe total time (#32875 ) Previously, we didn't configure the failureThreshold, so it used its default value. The final `startupProbe` looked like this: ```json { "startupProbe": { "failureThreshold": 3, "periodSeconds": 3, "successThreshold": 1, "tcpSocket": { "port": 8081 }, "timeoutSeconds": 1 } ``` Because of it, the total time before k8s killed the container was 3 times `failureThreshold` * 3 seconds wait between probes `periodSeconds` = 9 seconds total (±3 seconds waiting for the probe response). This greatly affected PSM Security test server, some implementations of which waited for the ADS stream to be configured before starting listening on the maintenance port. This lead for the server container being killed for ~7 times before a successful startup: ``` 15:55:08.875586 "Killing container with a grace period" 15:53:38.875812 "Killing container with a grace period" 15:52:47.875752 "Killing container with a grace period" 15:52:38.874696 "Killing container with a grace period" 15:52:14.874491 "Killing container with a grace period" 15:52:05.875400 "Killing container with a grace period" 15:51:56.876138 "Killing container with a grace period" ``` These extra delays lead to PSM security tests timing out. ref b/277336725	2 years ago

... 2 3 4 5 6 ...

13644 Commits (498fc99479a01177aa2ad257f1390521495b0788)