mirror of https://github.com/grpc/grpc.git
commit
9ce1fbcae5
1 changed files with 122 additions and 0 deletions
@ -0,0 +1,122 @@ |
||||
gRPC Connectivity Semantics and API |
||||
=================================== |
||||
|
||||
This document describes the connectivity semantics for gRPC channels and the |
||||
corresponding impact on RPCs. We then discuss an API. |
||||
|
||||
States of Connectivity |
||||
---------------------- |
||||
|
||||
gRPC Channels provide the abstraction over which clients can communicate with |
||||
servers.The client-side channel object can be constructed using little more |
||||
than a DNS name. Channels encapsulate a range of functionality including name |
||||
resolution, establishing a TCP connection (with retries and backoff) and TLS |
||||
handshakes. Channels can also handle errors on established connections and |
||||
reconnect, or in the case of HTTP/2 GO_AWAY, re-resolve the name and reconnect. |
||||
|
||||
To hide the details of all this activity from the user of the gRPC API (i.e., |
||||
application code) while exposing meaningful information about the state of a |
||||
channel, we use a state machine with four states, defined below: |
||||
|
||||
CONNECTING: The channel is trying to establish a connection and is waiting to |
||||
make progress on one of the steps involved in name resolution, TCP connection |
||||
establishment or TLS handshake. This is the initial state for all channels upon |
||||
creation. |
||||
|
||||
READY: The channel has successfully established a connection all the way |
||||
through TLS handshake (or equivalent) and all subsequent attempt to communicate |
||||
have succeeded (or are pending without any known failure ). |
||||
|
||||
TRANSIENT_FAILURE: There has been some transient failure (such as a TCP 3-way |
||||
handshake timing out or a socket error). Channels in this state will eventually |
||||
switch to the CONNECTING state and try to establish a connection again. Since |
||||
retries are done with exponential backoff, channels that fail to connect will |
||||
start out spending very little time in this state but as the attempts fail |
||||
repeatedly, the channel will spend increasingly large amounts of time in this |
||||
state. For many non-fatal failures (e.g., TCP connection attempts timing out |
||||
because the server is not yet available), the channel may be stuck in this |
||||
state for an indefinitely large amount of time. |
||||
|
||||
FATAL_FAILURE: There has been a fatal failure and the channel will never |
||||
attempt to establish a connection again. (e.g., a server presenting an invalid |
||||
TLS certificate) |
||||
|
||||
Channels that enter this state never leave this state. |
||||
|
||||
The following table lists the legal transitions from one state to another and |
||||
corresponding reasons. Empty cells denote disallowed transitions. |
||||
|
||||
<table style='border: 1px solid black'> |
||||
<tr> |
||||
<th>From/To</th> |
||||
<th>CONNECTING</th> |
||||
<th>READY</th> |
||||
<th>TRANSIENT_FAILURE</th> |
||||
<th>FATAL_FAILURE</th> |
||||
</tr> |
||||
<tr> |
||||
<th>CONNECTING</th> |
||||
<td>Incremental progress during connection establishment</td> |
||||
<td>All steps needed to establish a connection succeeded</td> |
||||
<td>Any failure in any of the steps needed to establish connection</td> |
||||
<td>Fatal failure encountered while attempting a connection.</td> |
||||
</tr> |
||||
<tr> |
||||
<th>READY</th> |
||||
<td></td> |
||||
<td>Incremental successful communication on established channel.</td> |
||||
<td>Any failure encountered while expecting successful communication on |
||||
established channel.</td> |
||||
<td></td> |
||||
</tr> |
||||
<tr> |
||||
<th>TRANSIENT_FAILURE</th> |
||||
<td>Wait time required to implement (exponential) backoff is over.</td> |
||||
<td></td> |
||||
<td></td> |
||||
<td></td> |
||||
</tr> |
||||
<tr> |
||||
<th>FATAL_FAILURE</th> |
||||
<td></td> |
||||
<td></td> |
||||
<td></td> |
||||
<td></td> |
||||
</tr> |
||||
</table> |
||||
|
||||
|
||||
Channel State API |
||||
----------------- |
||||
|
||||
All gRPC libraries will expose a channel-level API method to poll the current |
||||
state of a channel. In C++, this method is called GetCurrentState and returns |
||||
an enum for one of the four legal states. |
||||
|
||||
All libraries should also expose an API that enables the application (user of |
||||
the gRPC API) to be notified when the channel state changes. Since state |
||||
changes can be rapid and race with any such notification, the notification |
||||
should just inform the user that some state change has happened, leaving it to |
||||
the user to poll the channel for the current state. |
||||
|
||||
The synchronous version of this API is: |
||||
|
||||
```cpp |
||||
bool WaitForStateChange(gpr_timespec deadline, ChannelState source_state); |
||||
``` |
||||
|
||||
which returns true when the state changes to something other than the |
||||
source_state and false if the deadline expires. Asynchronous and futures based |
||||
APIs should have a corresponding method that allows the application to be |
||||
notified when the state of a channel changes. |
||||
|
||||
Note that a notification is delivered every time there is a transition from any |
||||
state to any *other* state. On the other hand the rules for legal state |
||||
transition, require a transition from CONNECTING to TRANSIENT_FAILURE and back |
||||
to CONNECTING for every recoverable failure, even if the corresponding |
||||
exponential backoff requires no wait before retry. The combined effect is that |
||||
the application may receive state change notifications that appear spurious. |
||||
e.g., an application waiting for state changes on a channel that is CONNECTING |
||||
may receive a state change notification but find the channel in the same |
||||
CONNECTING state on polling for current state because the channel may have |
||||
spent infinitesimally small amount of time in the TRANSIENT_FAILURE state. |
Loading…
Reference in new issue