@ -0,0 +1,2 @@ |
||||
daysUntilLock: 90 |
||||
lockComment: false |
@ -0,0 +1,32 @@ |
||||
# Polling Engine Usage on gRPC client and Server |
||||
|
||||
_Author: Sree Kuchibhotla (@sreecha) - Sep 2018_ |
||||
|
||||
|
||||
This document talks about how polling engine is used in gRPC core (both on client and server code paths). |
||||
|
||||
## gRPC client |
||||
|
||||
### Relation between Call, Channel (sub-channels), Completion queue, `grpc_pollset` |
||||
- A gRPC Call is tied to a channel (more specifically a sub-channel) and a completion queue for the lifetime of the call. |
||||
- Once a _sub-channel_ is picked for the call, the file-descriptor (socket fd in case of TCP channels) is added to the pollset corresponding to call's completion queue. (Recall that as per [grpc-cq](grpc-cq.md), a completion queue has a pollset by default) |
||||
|
||||
![image](../images/grpc-call-channel-cq.png) |
||||
|
||||
|
||||
### Making progress on Async `connect()` on sub-channels (`grpc_pollset_set` usecase) |
||||
- A gRPC channel is created between a client and a 'target'. The 'target' may resolve in to one or more backend servers. |
||||
- A sub-channel is the 'connection' from a client to the backend server |
||||
- While establishing sub-cannels (i.e connections) to the backends, gRPC issues async [`connect()`](https://github.com/grpc/grpc/blob/v1.15.1/src/core/lib/iomgr/tcp_client_posix.cc#L296) calls which may not complete right away. When the `connect()` eventually succeeds, the socket fd is make 'writable' |
||||
- This means that the polling engine must be monitoring all these sub-channel `fd`s for writable events and we need to make sure there is a polling thread that monitors all these fds |
||||
- To accomplish this, the `grpc_pollset_set` is used the following way (see picture below) |
||||
|
||||
![image](../images/grpc-client-lb-pss.png) |
||||
|
||||
## gRPC server |
||||
|
||||
- The listening fd (i.e., the socket fd corresponding to the server listening port) is added to each of the server completion queues. Note that in gRPC we use SO_REUSEPORT option and create multiple listening fds but all of them map to the same listening port |
||||
- A new incoming channel is assigned to some server completion queue picked randomly (note that we currently [round-robin](https://github.com/grpc/grpc/blob/v1.15.1/src/core/lib/iomgr/tcp_server_posix.cc#L231) over the server completion queues) |
||||
|
||||
![image](../images/grpc-server-cq-fds.png) |
||||
|
@ -0,0 +1,64 @@ |
||||
# gRPC Completion Queue |
||||
|
||||
_Author: Sree Kuchibhotla (@sreecha) - Sep 2018_ |
||||
|
||||
Code: [completion_queue.cc](https://github.com/grpc/grpc/blob/v1.15.1/src/core/lib/surface/completion_queue.cc) |
||||
|
||||
This document gives an overview of completion queue architecture and focuses mainly on the interaction between completion queue and the Polling engine layer. |
||||
|
||||
## Completion queue attributes |
||||
Completion queue has two attributes |
||||
|
||||
- Completion_type: |
||||
- GRPC_CQ_NEXT: grpc_completion_queue_next() can be called (but not grpc_completion_queue_pluck()) |
||||
- GRPC_CQ_PLUCK: grpc_completion_queue_pluck() can be called (but not grpc_completion_queue_next()) |
||||
- GRPC_CQ_CALLBACK: The tags in the queue are function pointers to callbacks. Also, neither next() nor pluck() can be called on this |
||||
|
||||
- Polling_type: |
||||
- GRPC_CQ_NON_POLLING: Threads calling completion_queue_next/pluck do not do any polling |
||||
- GRPC_CQ_DEFAULT_POLLING: Threads calling completion_queue_next/pluck do polling |
||||
- GRPC_CQ_NON_LISTENING: Functionally similar to default polling except for a boolean attribute that states that the cq is non-listening. This is used by the grpc-server code to not associate any listening sockets with this completion-queue’s pollset |
||||
|
||||
|
||||
## Details |
||||
|
||||
![image](../images/grpc-cq.png) |
||||
|
||||
|
||||
### **grpc\_completion\_queue\_next()** & **grpc_completion_queue_pluck()** APIS |
||||
|
||||
|
||||
``` C++ |
||||
grpc_completion_queue_next(cq, deadline)/pluck(cq, deadline, tag) { |
||||
while(true) { |
||||
\\ 1. If an event is queued in the completion queue, dequeue and return |
||||
\\ (in case of pluck() dequeue only if the tag is the one we are interested in) |
||||
|
||||
\\ 2. If completion queue shutdown return |
||||
|
||||
\\ 3. In case of pluck, add (tag, worker) pair to the tag<->worker map on the cq |
||||
|
||||
\\ 4. Call grpc_pollset_work(cq’s-pollset, deadline) to do polling |
||||
\\ Note that if this function found some fds to be readable/writable/error, |
||||
\\ it would have scheduled those closures (which may queue completion events |
||||
\\ on SOME completion queue - not necessarily this one) |
||||
} |
||||
} |
||||
``` |
||||
|
||||
### Queuing a completion event (i.e., "tag") |
||||
|
||||
``` C++ |
||||
grpc_cq_end_op(cq, tag) { |
||||
\\ 1. Queue the tag in the event queue |
||||
|
||||
\\ 2. Find the pollset corresponding to the completion queue |
||||
\\ (i) If the cq is of type GRPC_CQ_NEXT, then KICK ANY worker |
||||
\\ i.e., call grpc_pollset_kick(pollset, nullptr) |
||||
\\ (ii) If the cq is of type GRPC_CQ_PLUCK, then search the tag<->worker |
||||
\\ map on the completion queue to find the worker. Then specifically |
||||
\\ kick that worker i.e call grpc_pollset_kick(pollset, worker) |
||||
} |
||||
|
||||
``` |
||||
|
@ -0,0 +1,154 @@ |
||||
# Polling Engines |
||||
|
||||
_Author: Sree Kuchibhotla (@sreecha) - Sep 2018_ |
||||
|
||||
|
||||
## Why do we need a 'polling engine' ? |
||||
|
||||
Polling engine component was created for the following reasons: |
||||
|
||||
- gRPC code deals with a bunch of file descriptors on which events like descriptor being readable/writable/error have to be monitored |
||||
- gRPC code knows the actions to perform when such events happen |
||||
- For example: |
||||
- `grpc_endpoint` code calls `recvmsg` call when the fd is readable and `sendmsg` call when the fd is writable |
||||
- ` tcp_client` connect code issues async `connect` and finishes creating the client once the fd is writable (i.e when the `connect` actually finished) |
||||
- gRPC needed some component that can "efficiently" do the above operations __using the threads provided by the applications (i.e., not create any new threads)__. Also by "efficiently" we mean optimized for latency and throughput |
||||
|
||||
|
||||
## Polling Engine Implementations in gRPC |
||||
There are multiple polling engine implementations depending on the OS and the OS version. Fortunately all of them expose the same interface |
||||
|
||||
- Linux: |
||||
|
||||
- **`epollex`** (default but requires kernel version >= 4.5), |
||||
- `epoll1` (If `epollex` is not available and glibc version >= 2.9) |
||||
- `poll` (If kernel does not have epoll support) |
||||
- `poll-cv` (If explicitly configured) |
||||
- Mac: **`poll`** (default), `poll-cv` (If explicitly configured) |
||||
- Windows: (no name) |
||||
- One-off polling engines: |
||||
- AppEngine platform: **`poll-cv`** (default) |
||||
- NodeJS : `libuv` polling engine implementation (requires different compile `#define`s) |
||||
|
||||
## Polling Engine Interface |
||||
|
||||
### Opaque Structures exposed by the polling engine |
||||
The following are the **Opaque** structures exposed by Polling Engine interface (NOTE: Different polling engine implementations have different definitions of these structures) |
||||
|
||||
- **grpc_fd:** Structure representing a file descriptor |
||||
- **grpc_pollset:** A set of one or more grpc_fds that are ‘polled’ for readable/writable/error events. One grpc_fd can be in multiple `grpc_pollset`s |
||||
- **grpc_pollset_worker:** Structure representing a ‘polling thread’ - more specifically, the thread that calls `grpc_pollset_work()` API |
||||
- **grpc_pollset_set:** A group of `grpc_fds`, `grpc_pollsets` and `grpc_pollset_sets` (yes, a `grpc_pollset_set` can contain other `grpc_pollset_sets`) |
||||
|
||||
### Polling engine API |
||||
|
||||
#### grpc_fd |
||||
- **grpc\_fd\_notify\_on\_[read|write|error]** |
||||
- Signature: `grpc_fd_notify_on_(grpc_fd* fd, grpc_closure* closure)` |
||||
- Register a [closure](https://github.com/grpc/grpc/blob/v1.15.1/src/core/lib/iomgr/closure.h#L67) to be called when the fd becomes readable/writable or has an error (In grpc parlance, we refer to this act as “arming the fd”) |
||||
- The closure is called exactly once per event. I.e once the fd becomes readable (or writable or error), the closure is fired and the fd is ‘unarmed’. To be notified again, the fd has to be armed again. |
||||
|
||||
- **grpc_fd_shutdown** |
||||
- Signature: `grpc_fd_shutdown(grpc_fd* fd)` |
||||
- Any current (or future) closures registered for readable/writable/error events are scheduled immediately with an error |
||||
|
||||
- **grpc_fd_orphan** |
||||
- Signature: `grpc_fd_orphan(grpc_fd* fd, grpc_closure* on_done, int* release_fd, char* reason)` |
||||
- Release the `grpc_fd` structure and call `on_done` closure when the operation is complete |
||||
- If `release_fd` is set to `nullptr`, then `close()` the underlying fd as well. If not, put the underlying fd in `release_fd` (and do not call `close()`) |
||||
- `release_fd` set to non-null in cases where the underlying fd is NOT owned by grpc core (like for example the fds used by C-Ares DNS resolver ) |
||||
|
||||
#### grpc_pollset |
||||
|
||||
- **grpc_pollset_add_fd ** |
||||
- Signature: `grpc_pollset_add_fd(grpc_pollset* ps, grpc_fd *fd)` |
||||
- Add fd to pollset |
||||
> **NOTE**: There is no `grpc_pollset_remove_fd`. This is because calling `grpc_fd_orphan()` will effectively remove the fd from all the pollsets it’s a part of |
||||
|
||||
- ** grpc_pollset_work ** |
||||
- Signature: `grpc_pollset_work(grpc_pollset* ps, grpc_pollset_worker** worker, grpc_millis deadline)` |
||||
> **NOTE**: `grpc_pollset_work()` requires the pollset mutex to be locked before calling it. Shortly after calling `grpc_pollset_work()`, the function populates the `*worker` pointer (among other things) and releases the mutex. Once `grpc_pollset_work()` returns, the `*worker` pointer is **invalid** and should not be used anymore. See the code in `completion_queue.cc` to see how this is used. |
||||
- Poll the fds in the pollset for events AND return when ANY of the following is true: |
||||
- Deadline expired |
||||
- Some fds in the pollset were found to be readable/writable/error and those associated closures were ‘scheduled’ (but not necessarily executed) |
||||
- worker is “kicked” (see `grpc_pollset_kick` for more details) |
||||
|
||||
- **grpc_pollset_kick** |
||||
- Signature: `grpc_pollset_kick(grpc_pollset* ps, grpc_pollset_worker* worker)` |
||||
- “Kick the worker” i.e Force the worker to return from grpc_pollset_work() |
||||
- If `worker == nullptr`, kick ANY worker active on that pollset |
||||
|
||||
#### grpc_pollset_set |
||||
|
||||
- **grpc\_pollset\_set\_[add|del]\_fd** |
||||
- Signature: `grpc_pollset_set_[add|del]_fd(grpc_pollset_set* pss, grpc_fd *fd)` |
||||
Add/Remove fd to the `grpc_pollset_set` |
||||
|
||||
- **grpc\_pollset\_set_[add|del]\_pollset** |
||||
- Signature: `grpc_pollset_set_[add|del]_pollset(grpc_pollset_set* pss, grpc_pollset* ps)` |
||||
- What does adding a pollset to a pollset_set mean ? |
||||
- It means that calling `grpc_pollset_work()` on the pollset will also poll all the fds in the pollset_set i.e semantically, it is similar to adding all the fds inside pollset_set to the pollset. |
||||
- This guarantee is no longer true once the pollset is removed from the pollset_set |
||||
|
||||
- **grpc\_pollset\_set_[add|del]\_pollset\_set** |
||||
- Signature: `grpc_pollset_set_[add|del]_pollset_set(grpc_pollset_set* bag, grpc_pollset_set* item)` |
||||
- Semantically, this is similar to adding all the fds in the ‘bag’ pollset_set to the ‘item’ pollset_set |
||||
|
||||
|
||||
#### Recap: |
||||
|
||||
__Relation between grpc_pollset_worker, grpc_pollset and grpc_fd:__ |
||||
|
||||
![image](../images/grpc-ps-pss-fd.png) |
||||
|
||||
__grpc_pollset_set__ |
||||
|
||||
![image](../images/grpc-pss.png) |
||||
|
||||
|
||||
## Polling Engine Implementations |
||||
|
||||
### epoll1 |
||||
|
||||
![image](../images/grpc-epoll1.png) |
||||
|
||||
Code at `src/core/lib/iomgr/ev_epoll1_posix.cc` |
||||
|
||||
- The logic to choose a designated poller is quite complicated. Pollsets are internally sharded into what are called `pollset_neighborhood` (a structure internal to `epoll1` polling engine implementation). `grpc_pollset_workers` that call `grpc_pollset_work` on a given pollset are all queued in a linked-list against the `grpc_pollset`. The head of the linked list is called "root worker" |
||||
|
||||
- There are as many neighborhoods as the number of cores. A pollset is put in a neighborhood based on the CPU core of the root worker thread. When picking the next designated poller, we always try to find another worker on the current pollset. If there are no more workers in the current pollset, a `pollset_neighborhood` listed is scanned to pick the next pollset and worker that could be the new designated poller. |
||||
- NOTE: There is room to tune this implementation. All we really need is good way to maintain a list of `grpc_pollset_workers` with a way to group them per-pollset (needed to implement `grpc_pollset_kick` semantics) and a way randomly select a new designated poller |
||||
|
||||
- See [`begin_worker()`](https://github.com/grpc/grpc/blob/v1.15.1/src/core/lib/iomgr/ev_epoll1_linux.cc#L729) function to see how a designated poller is chosen. Similarly [`end_worker()`](https://github.com/grpc/grpc/blob/v1.15.1/src/core/lib/iomgr/ev_epoll1_linux.cc#L916) function is called by the worker that was just out of `epoll_wait()` and will have to choose a new designated poller) |
||||
|
||||
|
||||
### epollex |
||||
|
||||
![image](../images/grpc-epollex.png) |
||||
|
||||
Code at `src/core/lib/iomgr/ev_epollex_posix.cc` |
||||
|
||||
- FDs are added to multiple epollsets with EPOLLEXCLUSIVE flag. This prevents multiple worker threads from waking up from polling whenever the fd is readable/writable |
||||
|
||||
- A few observations: |
||||
|
||||
- If multiple pollsets are pointing to the same `Pollable`, then the `pollable` MUST be either empty or of type `PO_FD` (i.e single-fd) |
||||
- A multi-pollable has one-and-only-one incoming link from a pollset |
||||
- The same FD can be in multiple `Pollable`s (even if one of the `Pollable`s is of type PO_FD) |
||||
- There cannot be two `Pollable`s of type PO_FD for the same fd |
||||
|
||||
- Why do we need `Pollable` of type PO_FD and PO_EMTPY ? |
||||
- The main reason is the Sync client API |
||||
- We create one new completion queue per call. If we didn’t have PO_EMPTY and PO_FD type pollables, then every call on a given channel will effectively have to create a `Pollable` and hence an epollset. This is because every completion queue automatically creates a pollset and the channel fd will have to be put in that pollset. This clearly requires an epollset to put that fd. Creating an epollset per call (even if we delete the epollset once the call is completed) would mean a lot of sys calls to create/delete epoll fds. This is clearly not a good idea. |
||||
- With these new types of `Pollable`s, all pollsets (corresponding to the new per-call completion queue) will initially point to PO_EMPTY global epollset. Then once the channel fd is added to the pollset, the pollset will point to the `Pollable` of type PO_FD containing just that fd (i.e it will reuse the existing `Pollable`). This way, the epoll fd creation/deletion churn is avoided. |
||||
|
||||
|
||||
### Other polling engine implementations (poll and windows polling engine) |
||||
- **poll** polling engine: gRPC's `poll` polling engine is quite complicated. It uses the `poll()` function to do the polling (and hence it is for platforms like osx where epoll is not available) |
||||
- The implementation is further complicated by the fact that poll() is level triggered (just keep this in mind in case you wonder why the code at `src/core/lib/iomgr/ev_poll_posix.cc` is written a certain/seemingly complicated way :)) |
||||
|
||||
- **Polling engine on Windows**: Windows polling engine looks nothing like other polling engines |
||||
- Unlike the grpc polling engines for Unix systems (epollex, epoll1 and poll) Windows endpoint implementation and polling engine implementations are very closely tied together |
||||
- Windows endpoint read/write API implementations use the Windows IO API which require specifying an [I/O completion port](https://docs.microsoft.com/en-us/windows/desktop/fileio/i-o-completion-ports) |
||||
- In Windows polling engine’s grpc_pollset_work() implementation, ONE of the threads is chosen to wait on the I/O completion port while other threads wait on a condition variable (much like the turnstile polling in epollex/epoll1) |
||||
|
After Width: | Height: | Size: 45 KiB |
After Width: | Height: | Size: 55 KiB |
After Width: | Height: | Size: 41 KiB |
After Width: | Height: | Size: 35 KiB |
After Width: | Height: | Size: 51 KiB |
After Width: | Height: | Size: 24 KiB |
After Width: | Height: | Size: 31 KiB |
After Width: | Height: | Size: 41 KiB |
@ -1,39 +0,0 @@ |
||||
/*
|
||||
* Copyright 2015 gRPC authors. |
||||
* |
||||
* Licensed under the Apache License, Version 2.0 (the "License"); |
||||
* you may not use this file except in compliance with the License. |
||||
* You may obtain a copy of the License at |
||||
* |
||||
* http://www.apache.org/licenses/LICENSE-2.0
|
||||
* |
||||
* Unless required by applicable law or agreed to in writing, software |
||||
* distributed under the License is distributed on an "AS IS" BASIS, |
||||
* WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. |
||||
* See the License for the specific language governing permissions and |
||||
* limitations under the License. |
||||
*/ |
||||
|
||||
/*
|
||||
* WARNING: Auto-generated code. |
||||
* |
||||
* To make changes to this file, change |
||||
* tools/codegen/core/gen_static_metadata.py, and then re-run it. |
||||
* |
||||
* This file contains the mapping from the index of each metadata element in the |
||||
* grpc static metadata table to the index of that element in the hpack static |
||||
* metadata table. If the element is not contained in the static hpack table, |
||||
* then the returned index is 0. |
||||
*/ |
||||
|
||||
#include <grpc/support/port_platform.h> |
||||
|
||||
#include "src/core/ext/transport/chttp2/transport/hpack_mapping.h" |
||||
|
||||
const uint8_t grpc_hpack_static_mdelem_indices[GRPC_STATIC_MDELEM_COUNT] = { |
||||
0, 0, 0, 0, 0, 0, 0, 0, 3, 8, 13, 6, 7, 0, 1, 2, 0, 4, |
||||
5, 9, 10, 11, 12, 14, 15, 0, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, |
||||
0, 0, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, |
||||
42, 43, 44, 0, 0, 45, 46, 47, 48, 49, 50, 51, 52, 53, 54, 55, 56, 57, |
||||
58, 59, 60, 61, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, |
||||
}; |
@ -1,38 +0,0 @@ |
||||
/*
|
||||
* Copyright 2015 gRPC authors. |
||||
* |
||||
* Licensed under the Apache License, Version 2.0 (the "License"); |
||||
* you may not use this file except in compliance with the License. |
||||
* You may obtain a copy of the License at |
||||
* |
||||
* http://www.apache.org/licenses/LICENSE-2.0
|
||||
* |
||||
* Unless required by applicable law or agreed to in writing, software |
||||
* distributed under the License is distributed on an "AS IS" BASIS, |
||||
* WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. |
||||
* See the License for the specific language governing permissions and |
||||
* limitations under the License. |
||||
*/ |
||||
|
||||
/*
|
||||
* WARNING: Auto-generated code. |
||||
* |
||||
* To make changes to this file, change |
||||
* tools/codegen/core/gen_static_metadata.py, and then re-run it. |
||||
* |
||||
* This file contains the mapping from the index of each metadata element in the |
||||
* grpc static metadata table to the index of that element in the hpack static |
||||
* metadata table. If the element is not contained in the static hpack table, |
||||
* then the returned index is 0. |
||||
*/ |
||||
|
||||
#ifndef GRPC_CORE_EXT_TRANSPORT_CHTTP2_TRANSPORT_HPACK_MAPPING_H |
||||
#define GRPC_CORE_EXT_TRANSPORT_CHTTP2_TRANSPORT_HPACK_MAPPING_H |
||||
|
||||
#include <grpc/support/port_platform.h> |
||||
|
||||
#include "src/core/lib/transport/static_metadata.h" |
||||
|
||||
extern const uint8_t grpc_hpack_static_mdelem_indices[GRPC_STATIC_MDELEM_COUNT]; |
||||
|
||||
#endif /* GRPC_CORE_EXT_TRANSPORT_CHTTP2_TRANSPORT_HPACK_MAPPING_H */ |