Looks like our MSAN build is just taking longer at the moment, so increase sharding to reduce per-shard runtime.
Closes#37780
COPYBARA_INTEGRATE_REVIEW=https://github.com/grpc/grpc/pull/37780 from ctiller:flake-fightas-13 07c422e977
PiperOrigin-RevId: 676869265
This change appears to cause a deadlock.
This reverts commit 4b53c2bac3.
<!--
If you know who should review your pull request, please assign it to that
person, otherwise the pull request would get assigned randomly.
If your pull request is for a specific language, please add the appropriate
lang label.
-->
Closes#37769
COPYBARA_INTEGRATE_REVIEW=https://github.com/grpc/grpc/pull/37769 from yousukseung:python_atexit_rollback 9218d1066e
PiperOrigin-RevId: 676566209
In some rare occasions on Win machines (0,2-0,5%), the tests are stuck before the handshake when we execute `grpc_call_start_batch`. We receive OP_COMPLETE with `Deadline Exceeded {grpc_status:4}` for such cases. The PR bumps it from 5 to 30s (no flakes for --runs_per_test=1000).
Closes#37767
COPYBARA_INTEGRATE_REVIEW=https://github.com/grpc/grpc/pull/37767 from erm-g:h2test 9d1208ee1f
PiperOrigin-RevId: 676531975
MetadataPluginCallCredentials passes a function pointer to C-core to be called during destruction. This function grabs GIL which may race between GIL destruction during process shutdown. Since GIL destruction happens after Python's exit handlers, we cana mark that Python is shutting down from an exit handler and don't grab GIL in this function afterwards using a C mutex.
<!--
If you know who should review your pull request, please assign it to that
person, otherwise the pull request would get assigned randomly.
If your pull request is for a specific language, please add the appropriate
lang label.
-->
Closes#37762
COPYBARA_INTEGRATE_REVIEW=https://github.com/grpc/grpc/pull/37762 from yousukseung:gil-shutdown 945eca9c6a
PiperOrigin-RevId: 676471627
This log line was printing an incorrect new state, and that led me down a multi-hour garden path whereby I was fixing a bug that did not exist.
Closes#37750
COPYBARA_INTEGRATE_REVIEW=https://github.com/grpc/grpc/pull/37750 from ctiller:flake-fightas-12 020dd7379c
PiperOrigin-RevId: 676253730
Previously, if we pulled server trailing metadata *before* the call was added to the client transport then we'd never call `on_done_` on the spine and consequently never remove the call from the map. This change fixes that edge case.
In fixing it, I noticed a state in `CallState` that was both complicating the fix and completely irrelevant because we respecced earlier this year to say that ServerTrailingMetadata processing cannot be asynchronous, so I'm removing that state also.
Closes#37749
COPYBARA_INTEGRATE_REVIEW=https://github.com/grpc/grpc/pull/37749 from ctiller:flake-fightas-11 847814a286
PiperOrigin-RevId: 676246259
The gRPC Core API currently requires callers to provide initial metadata before trailing metadata. You can see the C++ Callback API do this bookkeeping, for example. There is an eventual goal to be able to provide these in any order, and have gRPC do the right thing, but core is not there yet.
The proxy fixture in our end2end tests had a rare scenario in which trailing metadata from the server would show up at the proxy before initial metadata. This is part of the proxy's job: to split up batches into singular-operations that can complete in any order. There was, however, a rare flake wherein trailing metadata would complete before initial metadata, and the result was both client and server waiting on each other to respond.
This change adds a way for the proxy to defer sending trailing metadata back to the client, until after initial metadata has been sent to the client. In my testing, this eliminates the flake I had been able to reproduce 1 in 10k times using a single test. It happened more frequently across the full set of tests in our CI test suites.
Closes#37738
COPYBARA_INTEGRATE_REVIEW=https://github.com/grpc/grpc/pull/37738 from drfloob:fix-proxy-fixture 6e0d7b7e6f
PiperOrigin-RevId: 676026493
It's possible for `LogStateChange` to be called after the party is destroyed (one thread calls Unref, which calls LogStateChange after the atomic op; another calls Unref concurrently, frees the last ref, deletes the object)
Consequently, it's not safe to call `DebugTag()` here - remove that.
Closes#37748
COPYBARA_INTEGRATE_REVIEW=https://github.com/grpc/grpc/pull/37748 from ctiller:flake-fightas-10 ddef399309
PiperOrigin-RevId: 675833221
Without this, we see GOAWAYs with "enter idle" irrespective of the reason being idleness or max connection age.
Closes#37709
COPYBARA_INTEGRATE_REVIEW=https://github.com/grpc/grpc/pull/37709 from yashykt:ChannelIdleFilterMessage 236072e7e2
PiperOrigin-RevId: 675762380
`grpc/core/master/macos/grpc_basictests_python` test is flaky, all test cases passed by the test itself [failed with timeout](https://btx.cloud.google.com/invocations/33bdec48-e17b-431b-b473-c8a335e1650c/log):
```
2024-09-11 16:48:57,385 TIMEOUT: py38.asyncio.tests_aio.unit.done_callback_test.TestServerSideDoneCallback [pid=11128, time=486.5sec]
```
This PR does two things:
1. Increase timeout for test case from `480s` to `600s`.
2. Add timestamp to test starting and passing so we know which test case takes a long time to run.
Log before timestamp change:
```
Running tests_aio.unit.done_callback_test.TestServerSideDoneCallback.test_stream_stream
Running tests_aio.unit.done_callback_test.TestServerSideDoneCallback.test_stream_unary
Running tests_aio.unit.done_callback_test.TestServerSideDoneCallback.test_unary_stream
Running tests_aio.unit.done_callback_test.TestServerSideDoneCallback.test_unary_unary
Testing gRPC Python...
SUCCESS tests_aio.unit.done_callback_test.TestServerSideDoneCallback.test_error_in_callback
SUCCESS tests_aio.unit.done_callback_test.TestServerSideDoneCallback.test_error_in_handler
SUCCESS tests_aio.unit.done_callback_test.TestServerSideDoneCallback.test_stream_stream
SUCCESS tests_aio.unit.done_callback_test.TestServerSideDoneCallback.test_stream_unary
SUCCESS tests_aio.unit.done_callback_test.TestServerSideDoneCallback.test_unary_stream
SUCCESS tests_aio.unit.done_callback_test.TestServerSideDoneCallback.test_unary_unary
```
Log after timestamp change:
```
Running tests_aio.unit.done_callback_test.TestClientSideDoneCallback.test_stream_stream
Running tests_aio.unit.done_callback_test.TestClientSideDoneCallback.test_stream_unary
Running tests_aio.unit.done_callback_test.TestClientSideDoneCallback.test_unary_stream
Running tests_aio.unit.done_callback_test.TestClientSideDoneCallback.test_unary_unary
[2024-09-12 10:47:13.877185]Testing gRPC Python...
[2024-09-12 10:47:13.901135]SUCCESS tests_aio.unit.done_callback_test.TestClientSideDoneCallback.test_add_after_done
[2024-09-12 10:47:13.933705]SUCCESS tests_aio.unit.done_callback_test.TestClientSideDoneCallback.test_stream_stream
[2024-09-12 10:47:13.962419]SUCCESS tests_aio.unit.done_callback_test.TestClientSideDoneCallback.test_stream_unary
[2024-09-12 10:47:13.990368]SUCCESS tests_aio.unit.done_callback_test.TestClientSideDoneCallback.test_unary_stream
```
<!--
If you know who should review your pull request, please assign it to that
person, otherwise the pull request would get assigned randomly.
If your pull request is for a specific language, please add the appropriate
lang label.
-->
Closes#37703
COPYBARA_INTEGRATE_REVIEW=https://github.com/grpc/grpc/pull/37703 from XuanWang-Amos:add_test_timestamp 4e13d29021
PiperOrigin-RevId: 675703890
`grpc_distribtests_gcp_python` test is [failing with permission error](https://fusion2.corp.google.com/ci;ids=1930537984/kokoro/prod:grpc%2Fcore%2Fmaster%2Flinux%2Fgrpc_distribtests_gcp_python/activity/2c6d57dd-f504-4834-9fd6-845db47ede01/log):
```
ERROR: (gcloud.functions.deploy) ResponseError: status=[403], code=[Ok], message=[Permission 'run.services.setIamPolicy' denied on resource 'projects/grpc-testing/locations/us-central1/services/grpc-gcf-distribtest-3cd3cc11-1a8f-4b88-9c3b-220f0bdf9fdc' (or resource may not exist).]
```
In the logs, looks like we're creating gen2 functions by default:
```
As of this Cloud SDK release, new functions will be deployed as 2nd gen functions by default. This is equivalent to currently deploying new with the --gen2 flag. Existing 1st gen functions will not be impacted and will continue to deploy as 1st gen functions.
```
Since we don't need functions provided by gen2 functions and use `--no-gen2` fixed the permission issue, we're adding `--no-gen2` flag to our test.
### Test:
* Passed manually run: http://sponge/e52ffb74-5a12-4e67-9f95-4638390ef57c
<!--
If you know who should review your pull request, please assign it to that
person, otherwise the pull request would get assigned randomly.
If your pull request is for a specific language, please add the appropriate
lang label.
-->
Closes#37684
COPYBARA_INTEGRATE_REVIEW=https://github.com/grpc/grpc/pull/37684 from XuanWang-Amos:fix_gcf 26588c0cc3
PiperOrigin-RevId: 675703636
This test has been flaking for a while with a WSAEACCESS error on the `bind` call.
Change the loop to only create on socket at a time (on Windows) to rule out something windows-specific is not liking the fact that we are opening multiple listen sockets on the same port.
Closes#37669
COPYBARA_INTEGRATE_REVIEW=https://github.com/grpc/grpc/pull/37669 from apolcyn:change_loop ffa105ba46
PiperOrigin-RevId: 675172803
Party contains a state field that contains ref count *and some other things*... so RefIfNonZero needs to pay attention to only consider the ref count portion.
Closes#37726
COPYBARA_INTEGRATE_REVIEW=https://github.com/grpc/grpc/pull/37726 from ctiller:flake-fightas-7 f2b3b3ff6d
PiperOrigin-RevId: 674514201
StartCall needs to be called from the party of the call being started, and so here we reorder things such that that's guaranteed.
PiperOrigin-RevId: 674471582
Move the problematic log to be a trace log (not needed for general workflows), and take the opportunity to clean up a few other errors to log every n seconds -- because we principally need the signal that it's happening, not every occurrence.
Closes#37721
COPYBARA_INTEGRATE_REVIEW=https://github.com/grpc/grpc/pull/37721 from ctiller:flake-fightas-6 ac18237c51
PiperOrigin-RevId: 674404222
3a138e4369 renamed a bunch .c files to .c.inc, this PR adds them to boringssl podspec file.
Closes#37690
COPYBARA_INTEGRATE_REVIEW=https://github.com/grpc/grpc/pull/37690 from HannahShiSFB:fix-objc-build-with-boring-ssl-0-0-37 407ce1464e
PiperOrigin-RevId: 674042284
The timer wasn't taking into account the time for one update, which is necessary to get predictable behavior
Closes#37691
COPYBARA_INTEGRATE_REVIEW=https://github.com/grpc/grpc/pull/37691 from ctiller:flake-fightas-5 8fc8fb8398
PiperOrigin-RevId: 673731339
Issue noticed on xds_end2end_test and is made worse worse when reducing `xds_resource_does_not_exist_timeout_ms` to 500 and running it on tsan.
Closes#37678
COPYBARA_INTEGRATE_REVIEW=https://github.com/grpc/grpc/pull/37678 from yashykt:XdsClientOnTimerDebugging 1d31e28d2c
PiperOrigin-RevId: 673479242
Pretty sure that the `LoadBalancedCall` instance is getting dropped before return on *some* path through `PickSubchannel`, although it's unclear to me which and it's hard to tell with this implementation.
Ensure that such an event cannot cause a crash by holding a ref to the object we need and calling through that.
This will be marginally worse performance per pick for now, but once work serializer dispatch lands everywhere the additional ref will disappear.
Closes#37663
COPYBARA_INTEGRATE_REVIEW=https://github.com/grpc/grpc/pull/37663 from ctiller:flake-fightas-3 d12b2e0540
PiperOrigin-RevId: 673088001
As title.
Bump core version since there were API changes (to logging APIs) since the last release.
Closes#37661
COPYBARA_INTEGRATE_REVIEW=https://github.com/grpc/grpc/pull/37661 from apolcyn:bump_core_version_202409092234 190b88342b
PiperOrigin-RevId: 672984841