From 9855a34e1af51211764fc96485f79ce7692dd1ce Mon Sep 17 00:00:00 2001 From: Eric Anderson Date: Mon, 11 May 2015 09:22:47 -0700 Subject: [PATCH 1/8] Add Connectivity doc This is a faithful representation of the current version of the Google Doc with only formatting changes. --- doc/connectivity-semantics-and-api.md | 122 ++++++++++++++++++++++++++ 1 file changed, 122 insertions(+) create mode 100644 doc/connectivity-semantics-and-api.md diff --git a/doc/connectivity-semantics-and-api.md b/doc/connectivity-semantics-and-api.md new file mode 100644 index 00000000000..842ff8adc18 --- /dev/null +++ b/doc/connectivity-semantics-and-api.md @@ -0,0 +1,122 @@ +gRPC Connectivity Semantics and API +=================================== + +This document describes the connectivity semantics for gRPC channels and the +corresponding impact on RPCs. We then discuss an API. + +States of Connectivity +---------------------- + +gRPC Channels provide the abstraction over which clients can communicate with +servers.The client-side channel object can be constructed using little more +than a DNS name. Channels encapsulate a range of functionality including name +resolution, establishing a TCP connection (with retries and backoff) and TLS +handshakes. Channels can also handle errors on established connections and +reconnect, or in the case of HTTP/2 GO_AWAY, re-resolve the name and reconnect. + +To hide the details of all this activity from the user of the gRPC API (i.e., +application code) while exposing meaningful information about the state of a +channel, we use a state machine with four states, defined below: + +CONNECTING: The channel is trying to establish a connection and is waiting to +make progress on one of the steps involved in name resolution, TCP connection +establishment or TLS handshake. This is the initial state for all channels upon +creation. + +READY: The channel has successfully established a connection all the way +through TLS handshake (or equivalent) and all subsequent attempt to communicate +have succeeded (or are pending without any known failure ). + +TRANSIENT_FAILURE: There has been some transient failure (such as a TCP 3-way +handshake timing out or a socket error). Channels in this state will eventually +switch to the CONNECTING state and try to establish a connection again. Since +retries are done with exponential backoff, channels that fail to connect will +start out spending very little time in this state but as the attempts fail +repeatedly, the channel will spend increasingly large amounts of time in this +state. For many non-fatal failures (e.g., TCP connection attempts timing out +because the server is not yet available), the channel may be stuck in this +state for an indefinitely large amount of time. + +FATAL_FAILURE: There has been a fatal failure and the channel will never +attempt to establish a connection again. (e.g., a server presenting an invalid +TLS certificate) + +Channels that enter this state never leave this state. + +The following table lists the legal transitions from one state to another and +corresponding reasons. Empty cells denote disallowed transitions. + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +
From/ToCONNECTINGREADYTRANSIENT_FAILUREFATAL_FAILURE
CONNECTINGIncremental progress during connection establishmentAll steps needed to establish a connection succeededAny failure in any of the steps needed to establish connectionFatal failure encountered while attempting a connection.
READYIncremental successful communication on established channel.Any failure encountered while expecting successful communication on + established channel.
TRANSIENT_FAILUREWait time required to implement (exponential) backoff is over.
FATAL_FAILURE
+ + +Channel State API +----------------- + +All gRPC libraries will expose a channel-level API method to poll the current +state of a channel. In C++, this method is called GetCurrentState and returns +an enum for one of the four legal states. + +All libraries should also expose an API that enables the application (user of +the gRPC API) to be notified when the channel state changes. Since state +changes can be rapid and race with any such notification, the notification +should just inform the user that some state change has happened, leaving it to +the user to poll the channel for the current state. + +The synchronous version of this API is: + +```cpp +bool WaitForStateChange(gpr_timespec deadline, ChannelState source_state); +``` + +which returns true when the state changes to something other than the +source_state and false if the deadline expires. Asynchronous and futures based +APIs should have a corresponding method that allows the application to be +notified when the state of a channel changes. + +Note that a notification is delivered every time there is a transition from any +state to any *other* state. On the other hand the rules for legal state +transition, require a transition from CONNECTING to TRANSIENT_FAILURE and back +to CONNECTING for every recoverable failure, even if the corresponding +exponential backoff requires no wait before retry. The combined effect is that +the application may receive state change notifications that appear spurious. +e.g., an application waiting for state changes on a channel that is CONNECTING +may receive a state change notification but find the channel in the same +CONNECTING state on polling for current state because the channel may have +spent infinitesimally small amount of time in the TRANSIENT_FAILURE state. From eb741d1772101852e8b5ff8e3d764691e2b62ab8 Mon Sep 17 00:00:00 2001 From: Jan Tattermusch Date: Thu, 11 Jun 2015 19:09:15 -0700 Subject: [PATCH 2/8] fix native callback signature to match with C# --- src/csharp/ext/grpc_csharp_ext.c | 8 ++++---- 1 file changed, 4 insertions(+), 4 deletions(-) diff --git a/src/csharp/ext/grpc_csharp_ext.c b/src/csharp/ext/grpc_csharp_ext.c index 59b8993ad3d..83372182552 100644 --- a/src/csharp/ext/grpc_csharp_ext.c +++ b/src/csharp/ext/grpc_csharp_ext.c @@ -65,8 +65,6 @@ grpc_byte_buffer *string_to_byte_buffer(const char *buffer, size_t len) { return bb; } -typedef void(GPR_CALLTYPE *callback_funcptr)(gpr_int32 success, void *batch_context); - /* * Helper to maintain lifetime of batch op inputs and store batch op outputs. */ @@ -731,10 +729,12 @@ GPR_EXPORT void GPR_CALLTYPE grpcsharp_redirect_log(grpcsharp_log_func func) { gpr_set_log_function(grpcsharp_log_handler); } +typedef void(GPR_CALLTYPE *test_callback_funcptr)(gpr_int32 success); + /* For testing */ GPR_EXPORT void GPR_CALLTYPE -grpcsharp_test_callback(callback_funcptr callback) { - callback(1, NULL); +grpcsharp_test_callback(test_callback_funcptr callback) { + callback(1); } /* For testing */ From bc17b3f0aee4b071a8bcbba90ad77764d39e80f7 Mon Sep 17 00:00:00 2001 From: Jan Tattermusch Date: Tue, 9 Jun 2015 08:22:51 -0700 Subject: [PATCH 3/8] fetch PR refspec for docker jenkins builds --- tools/jenkins/run_jenkins.sh | 10 +++++++++- 1 file changed, 9 insertions(+), 1 deletion(-) diff --git a/tools/jenkins/run_jenkins.sh b/tools/jenkins/run_jenkins.sh index a95819af601..3fa5914bc11 100755 --- a/tools/jenkins/run_jenkins.sh +++ b/tools/jenkins/run_jenkins.sh @@ -41,9 +41,17 @@ if [ "$platform" == "linux" ] then echo "building $language on Linux" + if [ "$ghprbPullId" != "" ] + then + # if we are building a pull request, grab corresponding refs. + FETCH_PULL_REQUEST_CMD="&& git fetch $GIT_URL +refs/pull/$ghprbPullId:refs/remotes/origin/pr/$ghprbPullId" + fi + # Run tests inside docker docker run grpc/grpc_jenkins_slave bash -c -l "git clone --recursive $GIT_URL /var/local/git/grpc \ - && cd /var/local/git/grpc && git checkout -f $GIT_COMMIT \ + && cd /var/local/git/grpc \ + $FETCH_PULL_REQUEST_CMD \ + && git checkout -f $GIT_COMMIT \ && git submodule update \ && nvm use 0.12 \ && rvm use ruby-2.1 \ From c3dd3bfb85c9456765814093a7dc9661950ba795 Mon Sep 17 00:00:00 2001 From: Jan Tattermusch Date: Fri, 12 Jun 2015 09:18:49 -0700 Subject: [PATCH 4/8] fix fetch to actually work --- tools/jenkins/run_jenkins.sh | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/tools/jenkins/run_jenkins.sh b/tools/jenkins/run_jenkins.sh index 3fa5914bc11..a754015f400 100755 --- a/tools/jenkins/run_jenkins.sh +++ b/tools/jenkins/run_jenkins.sh @@ -44,7 +44,7 @@ then if [ "$ghprbPullId" != "" ] then # if we are building a pull request, grab corresponding refs. - FETCH_PULL_REQUEST_CMD="&& git fetch $GIT_URL +refs/pull/$ghprbPullId:refs/remotes/origin/pr/$ghprbPullId" + FETCH_PULL_REQUEST_CMD="&& git fetch $GIT_URL refs/pull/$ghprbPullId/merge refs/pull/$ghprbPullId/head" fi # Run tests inside docker From 0ee84dc10feb0eeddc304d18638bbcf3faff8f04 Mon Sep 17 00:00:00 2001 From: Jan Tattermusch Date: Fri, 12 Jun 2015 11:22:49 -0700 Subject: [PATCH 5/8] only keep docker containers for tests that failed --- tools/jenkins/run_jenkins.sh | 18 ++++++++++++++++-- 1 file changed, 16 insertions(+), 2 deletions(-) diff --git a/tools/jenkins/run_jenkins.sh b/tools/jenkins/run_jenkins.sh index a754015f400..57cec774248 100755 --- a/tools/jenkins/run_jenkins.sh +++ b/tools/jenkins/run_jenkins.sh @@ -47,15 +47,29 @@ then FETCH_PULL_REQUEST_CMD="&& git fetch $GIT_URL refs/pull/$ghprbPullId/merge refs/pull/$ghprbPullId/head" fi + # Make sure the CID file is gone. + rm -f docker.cid + # Run tests inside docker - docker run grpc/grpc_jenkins_slave bash -c -l "git clone --recursive $GIT_URL /var/local/git/grpc \ + docker run --cidfile=docker.cid grpc/grpc_jenkins_slave bash -c -l "git clone --recursive $GIT_URL /var/local/git/grpc \ && cd /var/local/git/grpc \ $FETCH_PULL_REQUEST_CMD \ && git checkout -f $GIT_COMMIT \ && git submodule update \ && nvm use 0.12 \ && rvm use ruby-2.1 \ - && tools/run_tests/run_tests.py -t -l $language" + && tools/run_tests/run_tests.py -t -l $language" || DOCKER_FAILED="true" + + DOCKER_CID=`cat docker.cid` + if [ "$DOCKER_FAILED" == "" ] + then + echo "Docker finished successfully, deleting the container $DOCKER_CID" + docker rm $DOCKER_CID + else + echo "Docker exited with failure, keeping container $DOCKER_CID." + echo "You can SSH to the worker and use 'docker start CID' and 'docker exec -i -t CID bash' to debug the problem." + fi + elif [ "$platform" == "windows" ] then echo "building $language on Windows" From b37307cfd864c315ea1de068990709b55cfb91ec Mon Sep 17 00:00:00 2001 From: Eric Dobson Date: Sun, 14 Jun 2015 10:49:43 -0700 Subject: [PATCH 6/8] Fix docs on gpr_slice_unref. --- include/grpc/support/slice.h | 5 +++-- 1 file changed, 3 insertions(+), 2 deletions(-) diff --git a/include/grpc/support/slice.h b/include/grpc/support/slice.h index 9026602f15f..3ee103bfb9c 100644 --- a/include/grpc/support/slice.h +++ b/include/grpc/support/slice.h @@ -110,8 +110,9 @@ gpr_slice gpr_slice_ref(gpr_slice s); /* Decrement the ref count of s. If the ref count of s reaches zero, all slices sharing the ref count are destroyed, and considered no longer initialized. If s is ultimately derived from a call to gpr_slice_new(start, - len, dest) where dest!=NULL , then (*dest)(start, len) is called. Requires - s initialized. */ + len, dest) where dest!=NULL , then (*dest)(start) is called, else if s is + ultimately derived from a call to gpr_slice_new_with_len(start, len, dest) + where dest!=NULL , then (*dest)(start, len). Requires s initialized. */ void gpr_slice_unref(gpr_slice s); /* Create a slice pointing at some data. Calls malloc to allocate a refcount From 255971d99a5c4cd447f2d97dbfc19155d659249a Mon Sep 17 00:00:00 2001 From: Eric Dobson Date: Sun, 14 Jun 2015 16:04:15 -0700 Subject: [PATCH 7/8] Fix docs on grpc_op_type. --- include/grpc/grpc.h | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/include/grpc/grpc.h b/include/grpc/grpc.h index 62ee2a15051..73c30118a4a 100644 --- a/include/grpc/grpc.h +++ b/include/grpc/grpc.h @@ -221,7 +221,7 @@ typedef enum { GRPC_OP_SEND_INITIAL_METADATA = 0, /* Send a message: 0 or more of these operations can occur for each call */ GRPC_OP_SEND_MESSAGE, - /* Send a close from the server: one and only one instance MUST be sent from + /* Send a close from the client: one and only one instance MUST be sent from the client, unless the call was cancelled - in which case this can be skipped */ GRPC_OP_SEND_CLOSE_FROM_CLIENT, @@ -240,7 +240,7 @@ typedef enum { the status will indicate some failure. */ GRPC_OP_RECV_STATUS_ON_CLIENT, - /* Receive status on the server: one and only one must be made on the server + /* Receive close on the server: one and only one must be made on the server */ GRPC_OP_RECV_CLOSE_ON_SERVER } grpc_op_type; From be2a75c51e622a9ab3d913d95c8ae214b5530deb Mon Sep 17 00:00:00 2001 From: Oliver Smith Date: Sun, 14 Jun 2015 21:17:02 -0700 Subject: [PATCH 8/8] include string_util.h for the prototype of gpr_strdup and typo. Fixes two warnings in log_win32.c under msvc2015 log_win32.c(109): warning C4013: 'gpr_strdup' undefined; assuming extern returning int log_win32.c(109): warning C4047: 'return': 'char *' differs in levels of indirection from 'int' and corrects a typo in an error message --- src/core/support/log_win32.c | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) diff --git a/src/core/support/log_win32.c b/src/core/support/log_win32.c index 159c7e052c5..f878e6aaa17 100644 --- a/src/core/support/log_win32.c +++ b/src/core/support/log_win32.c @@ -42,6 +42,7 @@ #include #include #include +#include #include "src/core/support/string.h" #include "src/core/support/string_win32.h" @@ -106,7 +107,7 @@ char *gpr_format_message(DWORD messageid) { NULL, messageid, MAKELANGID(LANG_NEUTRAL, SUBLANG_DEFAULT), (LPTSTR)(&tmessage), 0, NULL); - if (status == 0) return gpr_strdup("Unable to retreive error string"); + if (status == 0) return gpr_strdup("Unable to retrieve error string"); message = gpr_tchar_to_char(tmessage); LocalFree(tmessage); return message;