If the calling application does not provide a signal number, then the gRPC library will relegate to using a model similar to the current implementation (where every thread does a blocking `poll()` on its `wakeup_fd` and the `epoll_fd`). The function` psi_wait() `in figure 2 implements this logic.
If the calling application does not provide a signal number, then the gRPC library will relegate to using a model similar to the current implementation (where every thread does a blocking `poll()` on its `wakeup_fd` and the `epoll_fd`). The function` psi_wait() `in figure 2 implements this logic.
**>> **(**NOTE**: Or alternatively, we can implement a turnstile polling (i.e having only one thread calling `epoll_wait()` on the epoll set at any time - which all other threads call poll on their `wakeup_fds`)
**>> **(**NOTE**: Or alternatively, we can implement a turnstile polling (i.e having only one thread calling `epoll_wait()` on the epoll set at any time - which all other threads call poll on their `wakeup_fds`)
in case of not getting a signal number from the applications.
in case of not getting a signal number from the applications.
@ -7,7 +7,7 @@ This document talks about how polling engine is used in gRPC core (both on clien
## gRPC client
## gRPC client
### Relation between Call, Channel (sub-channels), Completion queue, `grpc_pollset`
### Relation between Call, Channel (sub-channels), Completion queue, `grpc_pollset`
- A gRPC Call is tied to a channel (more specifically a sub-channel) and a completion queue for the lifetime of the call.
- A gRPC Call is tied to a channel (more specifically a sub-channel) and a completion queue for the lifetime of the call.
- Once a _sub-channel_ is picked for the call, the file-descriptor (socket fd in case of TCP channels) is added to the pollset corresponding to call's completion queue. (Recall that as per [grpc-cq](grpc-cq.md), a completion queue has a pollset by default)
- Once a _sub-channel_ is picked for the call, the file-descriptor (socket fd in case of TCP channels) is added to the pollset corresponding to call's completion queue. (Recall that as per [grpc-cq](grpc-cq.md), a completion queue has a pollset by default)
@ -14,7 +14,7 @@ The keepalive ping is controlled by two important channel arguments -
The above two channel arguments should be sufficient for most users, but the following arguments can also be useful in certain use cases.
The above two channel arguments should be sufficient for most users, but the following arguments can also be useful in certain use cases.
* **GRPC_ARG_KEEPALIVE_PERMIT_WITHOUT_CALLS**
* **GRPC_ARG_KEEPALIVE_PERMIT_WITHOUT_CALLS**
* This channel argument if set to 1 (0 : false; 1 : true), allows keepalive pings to be sent even if there are no calls in flight.
* This channel argument if set to 1 (0 : false; 1 : true), allows keepalive pings to be sent even if there are no calls in flight.
* **GRPC_ARG_HTTP2_MAX_PINGS_WITHOUT_DATA**
* **GRPC_ARG_HTTP2_MAX_PINGS_WITHOUT_DATA**
* This channel argument controls the maximum number of pings that can be sent when there is no data/header frame to be sent. GRPC Core will not continue sending pings if we run over the limit. Setting it to 0 allows sending pings without such a restriction.
* This channel argument controls the maximum number of pings that can be sent when there is no data/header frame to be sent. GRPC Core will not continue sending pings if we run over the limit. Setting it to 0 allows sending pings without such a restriction.
A third-party security audit of gRPC C++ stack was performed by [Cure53](https://cure53.de) in October 2019. The full report can be found [here](https://github.com/grpc/grpc/tree/master/doc/grpc_security_audit.pdf).
A third-party security audit of gRPC C++ stack was performed by [Cure53](https://cure53.de) in October 2019. The full report can be found [here](https://github.com/grpc/grpc/tree/master/doc/grpc_security_audit.pdf).
# Addressing grpc_security_audit
# Addressing grpc_security_audit
@ -21,7 +21,7 @@ Below is a list of alternatives that gRPC team considered.
### Alternative #1: Rewrite gpr_free to take void\*\*
### Alternative #1: Rewrite gpr_free to take void\*\*
One solution is to change the API of `gpr_free` so that it automatically nulls the given pointer after freeing it.
One solution is to change the API of `gpr_free` so that it automatically nulls the given pointer after freeing it.
```
```
gpr_free (void** ptr) {
gpr_free (void** ptr) {
@ -30,7 +30,7 @@ gpr_free (void** ptr) {
}
}
```
```
This defensive programming pattern would help protect gRPC from the potential exploits and latent dangling pointer bugs mentioned in the security report.
This defensive programming pattern would help protect gRPC from the potential exploits and latent dangling pointer bugs mentioned in the security report.
However, performance would be a significant concern as we are now unconditionally adding a store to every gpr_free call, and there are potentially hundreds of these per RPC. At the RPC layer, this can add up to prohibitive costs.
However, performance would be a significant concern as we are now unconditionally adding a store to every gpr_free call, and there are potentially hundreds of these per RPC. At the RPC layer, this can add up to prohibitive costs.
@ -61,7 +61,7 @@ Because of performance and maintainability concerns, GRP-01-002 will be addresse
## GRP-01-003 Calls to malloc suffer from potential integer overflows
## GRP-01-003 Calls to malloc suffer from potential integer overflows
The vulnerability, as defined by the report, is that calls to `gpr_malloc` in the C-core codebase may suffer from potential integer overflow in cases where we multiply the array element size by the size of the array. The penetration testers did not identify a concrete place where this occurred, but rather emphasized that the coding pattern itself had potential to lead to vulnerabilities. The report’s suggested solution for GRP-01-003 was to create a `calloc(size_t nmemb, size_t size)` wrapper that contains integer overflow checks.
The vulnerability, as defined by the report, is that calls to `gpr_malloc` in the C-core codebase may suffer from potential integer overflow in cases where we multiply the array element size by the size of the array. The penetration testers did not identify a concrete place where this occurred, but rather emphasized that the coding pattern itself had potential to lead to vulnerabilities. The report’s suggested solution for GRP-01-003 was to create a `calloc(size_t nmemb, size_t size)` wrapper that contains integer overflow checks.
However, gRPC team firmly believes that gRPC Core should only use integer overflow checks in the places where they’re needed; for example, any place where remote input influences the input to `gpr_malloc` in an unverified way. This is because bounds-checking is very expensive at the RPC layer.
However, gRPC team firmly believes that gRPC Core should only use integer overflow checks in the places where they’re needed; for example, any place where remote input influences the input to `gpr_malloc` in an unverified way. This is because bounds-checking is very expensive at the RPC layer.
Determining exactly where bounds-checking is needed requires an audit of tracing each `gpr_malloc` (or `gpr_realloc` or `gpr_zalloc`) call up the stack to determine if the sufficient bounds-checking was performed. This kind of audit, done manually, is fairly expensive engineer-wise.
Determining exactly where bounds-checking is needed requires an audit of tracing each `gpr_malloc` (or `gpr_realloc` or `gpr_zalloc`) call up the stack to determine if the sufficient bounds-checking was performed. This kind of audit, done manually, is fairly expensive engineer-wise.
All gRPC implementations use a three-part version number (`vX.Y.Z`) and follow [semantic versioning](https://semver.org/), which defines the semantics of major, minor and patch components of the version number. In addition to that, gRPC versions evolve according to these rules:
All gRPC implementations use a three-part version number (`vX.Y.Z`) and follow [semantic versioning](https://semver.org/), which defines the semantics of major, minor and patch components of the version number. In addition to that, gRPC versions evolve according to these rules:
- **Major version bumps** only happen on rare occasions. In order to qualify for a major version bump, certain criteria described later in this document need to be met. Most importantly, a major version increase must not break wire compatibility with other gRPC implementations so that existing gRPC libraries remain fully interoperable.
- **Major version bumps** only happen on rare occasions. In order to qualify for a major version bump, certain criteria described later in this document need to be met. Most importantly, a major version increase must not break wire compatibility with other gRPC implementations so that existing gRPC libraries remain fully interoperable.
- **Minor version bumps** happen approx. every 6 weeks as part of the normal release cycle as defined by the gRPC release process. A new release branch named vMAJOR.MINOR.PATCH) is cut every 6 weeks based on the [release schedule](https://github.com/grpc/grpc/blob/master/doc/grpc_release_schedule.md).
- **Minor version bumps** happen approx. every 6 weeks as part of the normal release cycle as defined by the gRPC release process. A new release branch named vMAJOR.MINOR.PATCH) is cut every 6 weeks based on the [release schedule](https://github.com/grpc/grpc/blob/master/doc/grpc_release_schedule.md).
- **Patch version bump** corresponds to bugfixes done on release branch.
- **Patch version bump** corresponds to bugfixes done on release branch.
@ -24,7 +24,7 @@ There are also a few extra rules regarding adding new gRPC implementations (e.g.
To avoid user confusion and simplify reasoning, the gRPC releases in different languages try to stay synchronized in terms of major and minor version (all languages follow the same release schedule). Nevertheless, because we also strictly follow semantic versioning, there are circumstances in which a gRPC implementation needs to break the version synchronicity and do a major version bump independently of other languages.
To avoid user confusion and simplify reasoning, the gRPC releases in different languages try to stay synchronized in terms of major and minor version (all languages follow the same release schedule). Nevertheless, because we also strictly follow semantic versioning, there are circumstances in which a gRPC implementation needs to break the version synchronicity and do a major version bump independently of other languages.
### Situations when it's ok to do a major version bump
### Situations when it's ok to do a major version bump
- **change forced by the language ecosystem:** when the language itself or its standard libraries that we depend on make a breaking change (something which is out of our control), reacting with updating gRPC APIs may be the only adequate response.
- **change forced by the language ecosystem:** when the language itself or its standard libraries that we depend on make a breaking change (something which is out of our control), reacting with updating gRPC APIs may be the only adequate response.
- **voluntary change:** Even in non-forced situations, there might be circumstances in which a breaking API change makes sense and represents a net win, but as a rule of thumb breaking changes are very disruptive for users, cause user fragmentation and incur high maintenance costs. Therefore, breaking API changes should be very rare events that need to be considered with extreme care and the bar for accepting such changes is intentionally set very high.
- **voluntary change:** Even in non-forced situations, there might be circumstances in which a breaking API change makes sense and represents a net win, but as a rule of thumb breaking changes are very disruptive for users, cause user fragmentation and incur high maintenance costs. Therefore, breaking API changes should be very rare events that need to be considered with extreme care and the bar for accepting such changes is intentionally set very high.
Example scenarios where a breaking API change might be adequate:
Example scenarios where a breaking API change might be adequate:
- fixing a security problem which requires changes to API (need to consider the non-breaking alternatives first)
- fixing a security problem which requires changes to API (need to consider the non-breaking alternatives first)