mirror of https://github.com/grpc/grpc.git
commit
ecd1d5449d
48 changed files with 290 additions and 516 deletions
@ -0,0 +1,197 @@ |
||||
# Transport Explainer |
||||
|
||||
@vjpai |
||||
|
||||
## Existing Transports |
||||
|
||||
[gRPC |
||||
transports](https://github.com/grpc/grpc/tree/master/src/core/ext/transport) |
||||
plug in below the core API (one level below the C++ or other wrapped-language |
||||
API). You can write your transport in C or C++ though; currently (Nov 2017) all |
||||
the transports are nominally written in C++ though they are idiomatically C. The |
||||
existing transports are: |
||||
|
||||
* [HTTP/2](https://github.com/grpc/grpc/tree/master/src/core/ext/transport/chttp2) |
||||
* [Cronet](https://github.com/grpc/grpc/tree/master/src/core/ext/transport/cronet) |
||||
* [In-process](https://github.com/grpc/grpc/tree/master/src/core/ext/transport/inproc) |
||||
|
||||
Among these, the in-process is likely the easiest to understand, though arguably |
||||
also the least similar to a "real" sockets-based transport since it is only used |
||||
in a single process. |
||||
|
||||
## Transport stream ops |
||||
|
||||
In the gRPC core implementation, a fundamental struct is the |
||||
`grpc_transport_stream_op_batch` which represents a collection of stream |
||||
operations sent to a transport. (Note that in gRPC, _stream_ and _RPC_ are used |
||||
synonymously since all RPCs are actually streams internally.) The ops in a batch |
||||
can include: |
||||
|
||||
* send\_initial\_metadata |
||||
- Client: initate an RPC |
||||
- Server: supply response headers |
||||
* recv\_initial\_metadata |
||||
- Client: get response headers |
||||
- Server: accept an RPC |
||||
* send\_message (zero or more) : send a data buffer |
||||
* recv\_message (zero or more) : receive a data buffer |
||||
* send\_trailing\_metadata |
||||
- Client: half-close indicating that no more messages will be coming |
||||
- Server: full-close providing final status for the RPC |
||||
* recv\_trailing\_metadata: get final status for the RPC |
||||
- Server extra: This op shouldn't actually be considered complete until the |
||||
server has also sent trailing metadata to provide the other side with final |
||||
status |
||||
* cancel\_stream: Attempt to cancel an RPC |
||||
* collect\_stats: Get stats |
||||
|
||||
The fundamental responsibility of the transport is to transform between this |
||||
internal format and an actual wire format, so the processing of these operations |
||||
is largely transport-specific. |
||||
|
||||
One or more of these ops are grouped into a batch. Applications can start all of |
||||
a call's ops in a single batch, or they can split them up into multiple |
||||
batches. Results of each batch are returned asynchronously via a completion |
||||
queue. |
||||
|
||||
Internally, we use callbacks to indicate completion. The surface layer creates a |
||||
callback when starting a new batch and sends it down the filter stack along with |
||||
the batch. The transport must invoke this callback when the batch is complete, |
||||
and then the surface layer returns an event to the application via the |
||||
completion queue. Each batch can have up to 3 callbacks: |
||||
|
||||
* recv\_initial\_metadata\_ready (called by the transport when the |
||||
recv\_initial\_metadata op is complete) |
||||
* recv\_message\_ready (called by the transport when the recv_message op is |
||||
complete) |
||||
* on\_complete (called by the transport when the entire batch is complete) |
||||
|
||||
## Timelines of transport stream op batches |
||||
|
||||
The transport's job is to sequence and interpret various possible interleavings |
||||
of the basic stream ops. For example, a sample timeline of batches would be: |
||||
|
||||
1. Client send\_initial\_metadata: Initiate an RPC with a path (method) and authority |
||||
1. Server recv\_initial\_metadata: accept an RPC |
||||
1. Client send\_message: Supply the input proto for the RPC |
||||
1. Server recv\_message: Get the input proto from the RPC |
||||
1. Client send\_trailing\_metadata: This is a half-close indicating that the |
||||
client will not be sending any more messages |
||||
1. Server recv\_trailing\_metadata: The server sees this from the client and |
||||
knows that it will not get any more messages. This won't complete yet though, |
||||
as described above. |
||||
1. Server send\_initial\_metadata, send\_message, send\_trailing\_metadata: A |
||||
batch can contain multiple ops, and this batch provides the RPC response |
||||
headers, response content, and status. Note that sending the trailing |
||||
metadata will also complete the server's receive of trailing metadata. |
||||
1. Client recv\_initial\_metadata: The number of ops in one side of the batch |
||||
has no relation with the number of ops on the other side of the batch. In |
||||
this case, the client is just collecting the response headers. |
||||
1. Client recv\_message, recv\_trailing\_metadata: Get the data response and |
||||
status |
||||
|
||||
|
||||
There are other possible sample timelines. For example, for client-side streaming, a "typical" sequence would be: |
||||
|
||||
1. Server: recv\_initial\_metadata |
||||
- At API-level, that would be the server requesting an RPC |
||||
1. Server: recv\_trailing\_metadata |
||||
- This is for when the server wants to know the final completion of the RPC |
||||
through an `AsyncNotifyWhenDone` API in C++ |
||||
1. Client: send\_initial\_metadata, recv\_message, recv\_trailing\_metadata |
||||
- At API-level, that's a client invoking a client-side streaming call. The |
||||
send\_initial\_metadata is the call invocation, the recv\_message colects |
||||
the final response from the server, and the recv\_trailing\_metadata gets |
||||
the `grpc::Status` value that will be returned from the call |
||||
1. Client: send\_message / Server: recv\_message |
||||
- Repeat the above step numerous times; these correspond to a client issuing |
||||
`Write` in a loop and a server doing `Read` in a loop until `Read` fails |
||||
1. Client: send\_trailing\_metadata / Server: recv\_message that indicates doneness (NULL) |
||||
- These correspond to a client issuing `WritesDone` which causes the server's |
||||
`Read` to fail |
||||
1. Server: send\_message, send\_trailing\_metadata |
||||
- These correpond to the server doing `Finish` |
||||
|
||||
The sends on one side will call their own callbacks when complete, and they will |
||||
in turn trigger actions that cause the other side's recv operations to |
||||
complete. In some transports, a send can sometimes complete before the recv on |
||||
the other side (e.g., in HTTP/2 if there is sufficient flow-control buffer space |
||||
available) |
||||
|
||||
## Other transport duties |
||||
|
||||
In addition to these basic stream ops, the transport must handle cancellations |
||||
of a stream at any time and pass their effects to the other side. For example, |
||||
in HTTP/2, this triggers a `RST_STREAM` being sent on the wire. The transport |
||||
must perform operations like pings and statistics that are used to shape |
||||
transport-level characteristics like flow control (see, for example, their use |
||||
in the HTTP/2 transport). |
||||
|
||||
## Putting things together with detail: Sending Metadata |
||||
|
||||
* API layer: `map<string, string>` that is specific to this RPC |
||||
* Core surface layer: array of `{slice, slice}` pairs where each slice |
||||
references an underlying string |
||||
* [Core transport |
||||
layer](https://github.com/grpc/grpc/tree/master/src/core/lib/transport): list |
||||
of `{slice, slice}` pairs that includes the above plus possibly some general |
||||
metadata (e.g., Method and Authority for initial metadata) |
||||
* [Specific transport |
||||
layer](https://github.com/grpc/grpc/tree/master/src/core/ext/transport): |
||||
- Either send it to the other side using transport-specific API (e.g., Cronet) |
||||
- Or have it sent through the [iomgr/endpoint |
||||
layer](https://github.com/grpc/grpc/tree/master/src/core/lib/iomgr) (e.g., |
||||
HTTP/2) |
||||
- Or just manipulate pointers to get it from one side to the other (e.g., |
||||
In-process) |
||||
|
||||
## Requirements for any transport |
||||
|
||||
Each transport implements several operations in a vtbl (may change to actual |
||||
virtual functions as transport moves to idiomatic C++). |
||||
|
||||
The most important and common one is `perform_stream_op`. This function |
||||
processes a single stream op batch on a specific stream that is associated with |
||||
a specific transport: |
||||
|
||||
* Gets the 6 ops/cancel passed down from the surface |
||||
* Pass metadata from one side to the other as described above |
||||
* Transform messages between slice buffer structure and stream of bytes to pass |
||||
to other side |
||||
- May require insertion of extra bytes (e.g., per-message headers in HTTP/2) |
||||
* React to metadata to preserve expected orderings (*) |
||||
* Schedule invocation of completion callbacks |
||||
|
||||
There are other functions in the vtbl as well. |
||||
|
||||
* `perform_transport_op` |
||||
- Configure the transport instance for the connectivity state change notifier |
||||
or the server-side accept callback |
||||
- Disconnect transport or set up a goaway for later streams |
||||
* `init_stream` |
||||
- Starts a stream from the client-side |
||||
- (*) Server-side of the transport must call `accept_stream_cb` when a new |
||||
stream is available |
||||
* Triggers request-matcher |
||||
* `destroy_stream`, `destroy_transport` |
||||
- Free up data related to a stream or transport |
||||
* `set_pollset`, `set_pollset_set`, `get_endpoint` |
||||
- Map each specific instance of the transport to FDs being used by iomgr (for |
||||
HTTP/2) |
||||
- Get a pointer to the endpoint structure that actually moves the data |
||||
(wrapper around a socket for HTTP/2) |
||||
|
||||
## Book-keeping responsibilities of the transport layer |
||||
|
||||
A given transport must keep all of its transport and streams ref-counted. This |
||||
is essential to make sure that no struct disappears before it is done being |
||||
used. |
||||
|
||||
A transport must also preserve relevant orders for the different categories of |
||||
ops on a stream, as described above. A transport must also make sure that all |
||||
relevant batch operations have completed before scheduling the `on_complete` |
||||
closure for a batch. Further examples include the idea that the server logic |
||||
expects to not complete recv\_trailing\_metadata until after it actually sends |
||||
trailing metadata since it would have already found this out by seeing a NULL’ed |
||||
recv\_message. This is considered part of the transport's duties in preserving |
||||
orders. |
@ -1,137 +0,0 @@ |
||||
/*
|
||||
* |
||||
* Copyright 2015 gRPC authors. |
||||
* |
||||
* Licensed under the Apache License, Version 2.0 (the "License"); |
||||
* you may not use this file except in compliance with the License. |
||||
* You may obtain a copy of the License at |
||||
* |
||||
* http://www.apache.org/licenses/LICENSE-2.0
|
||||
* |
||||
* Unless required by applicable law or agreed to in writing, software |
||||
* distributed under the License is distributed on an "AS IS" BASIS, |
||||
* WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. |
||||
* See the License for the specific language governing permissions and |
||||
* limitations under the License. |
||||
* |
||||
*/ |
||||
|
||||
#include "src/core/lib/support/stack_lockfree.h" |
||||
|
||||
#include <stdlib.h> |
||||
#include <string.h> |
||||
|
||||
#include <grpc/support/alloc.h> |
||||
#include <grpc/support/atm.h> |
||||
#include <grpc/support/log.h> |
||||
#include <grpc/support/port_platform.h> |
||||
|
||||
/* The lockfree node structure is a single architecture-level
|
||||
word that allows for an atomic CAS to set it up. */ |
||||
struct lockfree_node_contents { |
||||
/* next thing to look at. Actual index for head, next index otherwise */ |
||||
uint16_t index; |
||||
#ifdef GPR_ARCH_64 |
||||
uint16_t pad; |
||||
uint32_t aba_ctr; |
||||
#else |
||||
#ifdef GPR_ARCH_32 |
||||
uint16_t aba_ctr; |
||||
#else |
||||
#error Unsupported bit width architecture |
||||
#endif |
||||
#endif |
||||
}; |
||||
|
||||
/* Use a union to make sure that these are in the same bits as an atm word */ |
||||
typedef union lockfree_node { |
||||
gpr_atm atm; |
||||
struct lockfree_node_contents contents; |
||||
} lockfree_node; |
||||
|
||||
/* make sure that entries aligned to 8-bytes */ |
||||
#define ENTRY_ALIGNMENT_BITS 3 |
||||
/* reserve this entry as invalid */ |
||||
#define INVALID_ENTRY_INDEX ((1 << 16) - 1) |
||||
|
||||
struct gpr_stack_lockfree { |
||||
lockfree_node* entries; |
||||
lockfree_node head; /* An atomic entry describing curr head */ |
||||
}; |
||||
|
||||
gpr_stack_lockfree* gpr_stack_lockfree_create(size_t entries) { |
||||
gpr_stack_lockfree* stack; |
||||
stack = (gpr_stack_lockfree*)gpr_malloc(sizeof(*stack)); |
||||
/* Since we only allocate 16 bits to represent an entry number,
|
||||
* make sure that we are within the desired range */ |
||||
/* Reserve the highest entry number as a dummy */ |
||||
GPR_ASSERT(entries < INVALID_ENTRY_INDEX); |
||||
stack->entries = (lockfree_node*)gpr_malloc_aligned( |
||||
entries * sizeof(stack->entries[0]), ENTRY_ALIGNMENT_BITS); |
||||
/* Clear out all entries */ |
||||
memset(stack->entries, 0, entries * sizeof(stack->entries[0])); |
||||
memset(&stack->head, 0, sizeof(stack->head)); |
||||
|
||||
GPR_ASSERT(sizeof(stack->entries->atm) == sizeof(stack->entries->contents)); |
||||
|
||||
/* Point the head at reserved dummy entry */ |
||||
stack->head.contents.index = INVALID_ENTRY_INDEX; |
||||
/* Fill in the pad and aba_ctr to avoid confusing memcheck tools */ |
||||
#ifdef GPR_ARCH_64 |
||||
stack->head.contents.pad = 0; |
||||
#endif |
||||
stack->head.contents.aba_ctr = 0; |
||||
return stack; |
||||
} |
||||
|
||||
void gpr_stack_lockfree_destroy(gpr_stack_lockfree* stack) { |
||||
gpr_free_aligned(stack->entries); |
||||
gpr_free(stack); |
||||
} |
||||
|
||||
int gpr_stack_lockfree_push(gpr_stack_lockfree* stack, int entry) { |
||||
lockfree_node head; |
||||
lockfree_node newhead; |
||||
lockfree_node curent; |
||||
lockfree_node newent; |
||||
|
||||
/* First fill in the entry's index and aba ctr for new head */ |
||||
newhead.contents.index = (uint16_t)entry; |
||||
#ifdef GPR_ARCH_64 |
||||
/* Fill in the pad to avoid confusing memcheck tools */ |
||||
newhead.contents.pad = 0; |
||||
#endif |
||||
|
||||
/* Also post-increment the aba_ctr */ |
||||
curent.atm = gpr_atm_no_barrier_load(&stack->entries[entry].atm); |
||||
newhead.contents.aba_ctr = ++curent.contents.aba_ctr; |
||||
gpr_atm_no_barrier_store(&stack->entries[entry].atm, curent.atm); |
||||
|
||||
do { |
||||
/* Atomically get the existing head value for use */ |
||||
head.atm = gpr_atm_no_barrier_load(&(stack->head.atm)); |
||||
/* Point to it */ |
||||
newent.atm = gpr_atm_no_barrier_load(&stack->entries[entry].atm); |
||||
newent.contents.index = head.contents.index; |
||||
gpr_atm_no_barrier_store(&stack->entries[entry].atm, newent.atm); |
||||
} while (!gpr_atm_rel_cas(&(stack->head.atm), head.atm, newhead.atm)); |
||||
/* Use rel_cas above to make sure that entry index is set properly */ |
||||
return head.contents.index == INVALID_ENTRY_INDEX; |
||||
} |
||||
|
||||
int gpr_stack_lockfree_pop(gpr_stack_lockfree* stack) { |
||||
lockfree_node head; |
||||
lockfree_node newhead; |
||||
|
||||
do { |
||||
head.atm = gpr_atm_acq_load(&(stack->head.atm)); |
||||
if (head.contents.index == INVALID_ENTRY_INDEX) { |
||||
return -1; |
||||
} |
||||
newhead.atm = |
||||
gpr_atm_no_barrier_load(&(stack->entries[head.contents.index].atm)); |
||||
|
||||
} while (!gpr_atm_no_barrier_cas(&(stack->head.atm), head.atm, newhead.atm)); |
||||
|
||||
return head.contents.index; |
||||
} |
@ -1,46 +0,0 @@ |
||||
/*
|
||||
* |
||||
* Copyright 2015 gRPC authors. |
||||
* |
||||
* Licensed under the Apache License, Version 2.0 (the "License"); |
||||
* you may not use this file except in compliance with the License. |
||||
* You may obtain a copy of the License at |
||||
* |
||||
* http://www.apache.org/licenses/LICENSE-2.0
|
||||
* |
||||
* Unless required by applicable law or agreed to in writing, software |
||||
* distributed under the License is distributed on an "AS IS" BASIS, |
||||
* WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. |
||||
* See the License for the specific language governing permissions and |
||||
* limitations under the License. |
||||
* |
||||
*/ |
||||
|
||||
#ifndef GRPC_CORE_LIB_SUPPORT_STACK_LOCKFREE_H |
||||
#define GRPC_CORE_LIB_SUPPORT_STACK_LOCKFREE_H |
||||
|
||||
#include <stddef.h> |
||||
|
||||
#ifdef __cplusplus |
||||
extern "C" { |
||||
#endif |
||||
|
||||
typedef struct gpr_stack_lockfree gpr_stack_lockfree; |
||||
|
||||
/* This stack must specify the maximum number of entries to track.
|
||||
The current implementation only allows up to 65534 entries */ |
||||
gpr_stack_lockfree* gpr_stack_lockfree_create(size_t entries); |
||||
void gpr_stack_lockfree_destroy(gpr_stack_lockfree* stack); |
||||
|
||||
/* Pass in a valid entry number for the next stack entry */ |
||||
/* Returns 1 if this is the first element on the stack, 0 otherwise */ |
||||
int gpr_stack_lockfree_push(gpr_stack_lockfree*, int entry); |
||||
|
||||
/* Returns -1 on empty or the actual entry number */ |
||||
int gpr_stack_lockfree_pop(gpr_stack_lockfree* stack); |
||||
|
||||
#ifdef __cplusplus |
||||
} |
||||
#endif |
||||
|
||||
#endif /* GRPC_CORE_LIB_SUPPORT_STACK_LOCKFREE_H */ |
@ -1,7 +1,7 @@ |
||||
<!-- This file is generated --> |
||||
<Project> |
||||
<PropertyGroup> |
||||
<GrpcCsharpVersion>1.8.0-dev</GrpcCsharpVersion> |
||||
<GrpcCsharpVersion>1.9.0-dev</GrpcCsharpVersion> |
||||
<GoogleProtobufVersion>3.3.0</GoogleProtobufVersion> |
||||
</PropertyGroup> |
||||
</Project> |
||||
|
@ -1,140 +0,0 @@ |
||||
/*
|
||||
* |
||||
* Copyright 2015 gRPC authors. |
||||
* |
||||
* Licensed under the Apache License, Version 2.0 (the "License"); |
||||
* you may not use this file except in compliance with the License. |
||||
* You may obtain a copy of the License at |
||||
* |
||||
* http://www.apache.org/licenses/LICENSE-2.0
|
||||
* |
||||
* Unless required by applicable law or agreed to in writing, software |
||||
* distributed under the License is distributed on an "AS IS" BASIS, |
||||
* WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. |
||||
* See the License for the specific language governing permissions and |
||||
* limitations under the License. |
||||
* |
||||
*/ |
||||
|
||||
#include "src/core/lib/support/stack_lockfree.h" |
||||
|
||||
#include <stdlib.h> |
||||
|
||||
#include <grpc/support/alloc.h> |
||||
#include <grpc/support/log.h> |
||||
#include <grpc/support/sync.h> |
||||
#include <grpc/support/thd.h> |
||||
#include "test/core/util/test_config.h" |
||||
|
||||
/* max stack size supported */ |
||||
#define MAX_STACK_SIZE 65534 |
||||
|
||||
#define MAX_THREADS 32 |
||||
|
||||
static void test_serial_sized(size_t size) { |
||||
gpr_stack_lockfree* stack = gpr_stack_lockfree_create(size); |
||||
size_t i; |
||||
size_t j; |
||||
|
||||
/* First try popping empty */ |
||||
GPR_ASSERT(gpr_stack_lockfree_pop(stack) == -1); |
||||
|
||||
/* Now add one item and check it */ |
||||
gpr_stack_lockfree_push(stack, 3); |
||||
GPR_ASSERT(gpr_stack_lockfree_pop(stack) == 3); |
||||
GPR_ASSERT(gpr_stack_lockfree_pop(stack) == -1); |
||||
|
||||
/* Now add repeatedly more items and check them */ |
||||
for (i = 1; i < size; i *= 2) { |
||||
for (j = 0; j <= i; j++) { |
||||
GPR_ASSERT(gpr_stack_lockfree_push(stack, (int)j) == (j == 0)); |
||||
} |
||||
for (j = 0; j <= i; j++) { |
||||
GPR_ASSERT(gpr_stack_lockfree_pop(stack) == (int)(i - j)); |
||||
} |
||||
GPR_ASSERT(gpr_stack_lockfree_pop(stack) == -1); |
||||
} |
||||
|
||||
gpr_stack_lockfree_destroy(stack); |
||||
} |
||||
|
||||
static void test_serial() { |
||||
size_t i; |
||||
for (i = 128; i < MAX_STACK_SIZE; i *= 2) { |
||||
test_serial_sized(i); |
||||
} |
||||
test_serial_sized(MAX_STACK_SIZE); |
||||
} |
||||
|
||||
struct test_arg { |
||||
gpr_stack_lockfree* stack; |
||||
int stack_size; |
||||
int nthreads; |
||||
int rank; |
||||
int sum; |
||||
}; |
||||
|
||||
static void test_mt_body(void* v) { |
||||
struct test_arg* arg = (struct test_arg*)v; |
||||
int lo, hi; |
||||
int i; |
||||
int res; |
||||
lo = arg->rank * arg->stack_size / arg->nthreads; |
||||
hi = (arg->rank + 1) * arg->stack_size / arg->nthreads; |
||||
for (i = lo; i < hi; i++) { |
||||
gpr_stack_lockfree_push(arg->stack, i); |
||||
if ((res = gpr_stack_lockfree_pop(arg->stack)) != -1) { |
||||
arg->sum += res; |
||||
} |
||||
} |
||||
while ((res = gpr_stack_lockfree_pop(arg->stack)) != -1) { |
||||
arg->sum += res; |
||||
} |
||||
} |
||||
|
||||
static void test_mt_sized(size_t size, int nth) { |
||||
gpr_stack_lockfree* stack; |
||||
struct test_arg args[MAX_THREADS]; |
||||
gpr_thd_id thds[MAX_THREADS]; |
||||
int sum; |
||||
int i; |
||||
gpr_thd_options options = gpr_thd_options_default(); |
||||
|
||||
stack = gpr_stack_lockfree_create(size); |
||||
for (i = 0; i < nth; i++) { |
||||
args[i].stack = stack; |
||||
args[i].stack_size = (int)size; |
||||
args[i].nthreads = nth; |
||||
args[i].rank = i; |
||||
args[i].sum = 0; |
||||
} |
||||
gpr_thd_options_set_joinable(&options); |
||||
for (i = 0; i < nth; i++) { |
||||
GPR_ASSERT(gpr_thd_new(&thds[i], test_mt_body, &args[i], &options)); |
||||
} |
||||
sum = 0; |
||||
for (i = 0; i < nth; i++) { |
||||
gpr_thd_join(thds[i]); |
||||
sum = sum + args[i].sum; |
||||
} |
||||
GPR_ASSERT((unsigned)sum == ((unsigned)size * (size - 1)) / 2); |
||||
gpr_stack_lockfree_destroy(stack); |
||||
} |
||||
|
||||
static void test_mt() { |
||||
size_t size; |
||||
int nth; |
||||
for (nth = 1; nth < MAX_THREADS; nth++) { |
||||
for (size = 128; size < MAX_STACK_SIZE; size *= 2) { |
||||
test_mt_sized(size, nth); |
||||
} |
||||
test_mt_sized(MAX_STACK_SIZE, nth); |
||||
} |
||||
} |
||||
|
||||
int main(int argc, char** argv) { |
||||
grpc_test_init(argc, argv); |
||||
test_serial(); |
||||
test_mt(); |
||||
return 0; |
||||
} |
Loading…
Reference in new issue