mirror of https://github.com/c-ares/c-ares.git
We have been missing basic documentation on some of the features in c-ares for a while. This document should be used to document any features that need explanation for behavior analysis and how to use said feature. Authored-By: Brad House (@bradh352)pull/871/head
parent
063379049f
commit
5410a79428
3 changed files with 242 additions and 1 deletions
@ -0,0 +1,238 @@ |
|||||||
|
# Features |
||||||
|
|
||||||
|
Information about a few features in c-ares which can provide insight into |
||||||
|
behavior and security of the system, and what tunables may be used to tweak |
||||||
|
operation. |
||||||
|
|
||||||
|
- [Dynamic Server Timeout Calculation](#dynamic-server-timeout-calculation) |
||||||
|
- [Failed Server Isolation](#failed-server-isolation) |
||||||
|
- [Query Cache](#query-cache) |
||||||
|
- [DNS 0x20 Query Name Case Randomization](#dns-0x20-query-name-case-randomization) |
||||||
|
- [DNS Cookies](#dns-cookies) |
||||||
|
- [TCP FastOpen (0-RTT)](#tcp-fastopen) |
||||||
|
- [Event Thread](#event-thread) |
||||||
|
- [System Configuration Change Monitoring](#system-configuration-change-monitoring) |
||||||
|
|
||||||
|
|
||||||
|
## Dynamic Server Timeout Calculation |
||||||
|
|
||||||
|
Metrics are stored for every server in time series buckets for both the current |
||||||
|
time span and prior time span in 1 minute, 15 minute, 1 hour, and 1 day |
||||||
|
intervals, plus a single since-inception bucket (of the server in the c-ares |
||||||
|
channel). |
||||||
|
|
||||||
|
These metrics are then used to calculate the average latency for queries on |
||||||
|
each server, which automatically adjusts to network conditions. This average |
||||||
|
is then multiplied by 5 to come up with a timeout to use for the query before |
||||||
|
re-queuing it. If there is not sufficient data yet to calculate a timeout |
||||||
|
(need at least 3 prior queries), then the default of 2000ms is used (or an |
||||||
|
administrator-set `ARES_OPT_TIMEOUTMS`). |
||||||
|
|
||||||
|
The timeout is then adjusted to a minimum bound of 250ms which is the |
||||||
|
approximate RTT of network traffic half-way around the world, to account for the |
||||||
|
upstream server needing to recurse to a DNS server far away. It is also |
||||||
|
bounded on the upper end to 5000ms (or an administrator-set |
||||||
|
`ARES_OPT_MAXTIMEOUTMS`). |
||||||
|
|
||||||
|
If a server does not reply within the given calculated timeout, the next time |
||||||
|
the query is re-queued to the same server, the timeout will approximately |
||||||
|
double thus leading to adjustments in timeouts automatically when a successful |
||||||
|
reply is recorded. |
||||||
|
|
||||||
|
In order to calculate the optimal timeout, it is highly recommended to ensure |
||||||
|
`ARES_OPT_QUERY_CACHE` is enabled with a non-zero `qcache_max_ttl` (which it |
||||||
|
is enabled by default with a 3600s default max ttl). The goal is to record |
||||||
|
the recursion time as part of query latency as the upstream server will also |
||||||
|
cache results. |
||||||
|
|
||||||
|
This feature requires the c-ares channel to persist for the lifetime of the |
||||||
|
application. |
||||||
|
|
||||||
|
|
||||||
|
## Failed Server Isolation |
||||||
|
|
||||||
|
Each server is tracked for failures relating to consecutive connectivity issues |
||||||
|
or unrecoverable response codes. Servers are sorted in priority order based |
||||||
|
on this metric. Downed servers will be brought back online either when the |
||||||
|
current highest priority has failed, or has been determined to be online when |
||||||
|
a query is randomly selected to probe a downed server. |
||||||
|
|
||||||
|
By default a downed server won't be retried for 5 seconds, and queries will |
||||||
|
have a 10% chance of being chosen after this timeframe to test a downed server. |
||||||
|
Administrators may customize these settings via `ARES_OPT_SERVER_FAILOVER`. |
||||||
|
|
||||||
|
In the future we may use independent queries to probe downed servers to not |
||||||
|
impact latency of any queries when a server is known to be down. |
||||||
|
|
||||||
|
`ARES_OPT_ROTATE` or a system configuration option of `rotate` will disable |
||||||
|
this feature as servers will be chosen at random. In the future we may |
||||||
|
enhance this capability to only randomly choose online servers. |
||||||
|
|
||||||
|
This feature requires the c-ares channel to persist for the lifetime of the |
||||||
|
application. |
||||||
|
|
||||||
|
|
||||||
|
## Query Cache |
||||||
|
|
||||||
|
Every successful query response, as well as `NXDOMAIN` responses containing |
||||||
|
an `SOA` record are cached using the `TTL` returned or the SOA Minimum as |
||||||
|
appropriate. This timeout is bounded by the `ARES_OPT_QUERY_CACHE` |
||||||
|
`qcache_max_ttl`, which defaults to 1hr. |
||||||
|
|
||||||
|
The query is cached at the lowest possible layer, meaning a call into |
||||||
|
`ares_search_dnsrec()` or `ares_getaddrinfo()` may spawn multiple queries |
||||||
|
in order to complete its lookup, each individual backend query result will |
||||||
|
be cached. |
||||||
|
|
||||||
|
Any server list change will automatically invalidate the cache in order to |
||||||
|
purge any possible stale data. For example, if `NXDOMAIN` is cached but system |
||||||
|
configuration has changed due to a VPN connection, the same query might now |
||||||
|
result in a valid response. |
||||||
|
|
||||||
|
This feature is not expected to cause any issues that wouldn't already be |
||||||
|
present due to the upstream DNS server having substantially similar caching |
||||||
|
already. However if desired it can be disabled by setting `qcache_max_ttl` to |
||||||
|
`0`. |
||||||
|
|
||||||
|
This feature requires the c-ares channel to persist for the lifetime of the |
||||||
|
application. |
||||||
|
|
||||||
|
|
||||||
|
## DNS 0x20 Query Name Case Randomization |
||||||
|
|
||||||
|
DNS 0x20 is the name of the feature which automatically randomizes the case |
||||||
|
of the characters in a UDP query as defined in |
||||||
|
[draft-vixie-dnsext-dns0x20-00](https://datatracker.ietf.org/doc/html/draft-vixie-dnsext-dns0x20-00). |
||||||
|
|
||||||
|
For example, if name resolution is performed for `www.example.com`, the actual |
||||||
|
query sent to the upstream name server may be `Www.eXaMPlE.cOM`. |
||||||
|
|
||||||
|
The reason to randomize case characters is to provide additional entropy in the |
||||||
|
query to be able to detect off-path cache poisoning attacks for UDP. This is |
||||||
|
not used for TCP connections which are not known to be vulnerable to such |
||||||
|
attacks due to their stateful nature. |
||||||
|
|
||||||
|
Much research has been performed by |
||||||
|
[Google](https://groups.google.com/g/public-dns-discuss/c/KxIDPOydA5M) |
||||||
|
on case randomization and in general have found it to be effective and widely |
||||||
|
supported. |
||||||
|
|
||||||
|
This feature is disabled by default and can be enabled via `ARES_FLAG_DNS0x20`. |
||||||
|
There are some instances where servers do not properly facilitate this feature |
||||||
|
and unlike in a recursive resolver where it may be possible to determine an |
||||||
|
authoritative server is incapable, its much harder to come to any reliable |
||||||
|
conclusion as a stub resolver where the issue resides. Due to the recent wide |
||||||
|
deployment of DNS 0x20 in large public DNS servers, it is expected |
||||||
|
compatibility will improve rapidly where this feature, in time, may be able |
||||||
|
to be enabled by default. |
||||||
|
|
||||||
|
Another feature which can be used to prevent off-path cache poisoning attacks |
||||||
|
is [DNS Cookies](#dns-cookies). |
||||||
|
|
||||||
|
|
||||||
|
## DNS Cookies |
||||||
|
|
||||||
|
DNS Cookies are are a method of learned mutual authentication between a server |
||||||
|
and a client as defined in |
||||||
|
[RFC7873](https://datatracker.ietf.org/doc/html/rfc7873), |
||||||
|
and [RFC9018](https://datatracker.ietf.org/doc/html/rfc9018). |
||||||
|
|
||||||
|
This mutual authentication ensures clients are protected from off-path cache |
||||||
|
poisioning attacks, and protects servers from being used as DNS amplification |
||||||
|
attack sources. Many servers will disable query throttling limits when DNS |
||||||
|
Cookies are in use. It only applies to UDP connections. |
||||||
|
|
||||||
|
Since DNS Cookies are optional and learned dynamically, this is an always-on |
||||||
|
feature and will automatically adjust based on the upstream server state. The |
||||||
|
only potential issue is if a server has once supported DNS Cookies then stops |
||||||
|
supporting them, it must clear a regression timeout of 2 minutes before it can |
||||||
|
accept responses without cookies. Such a scenario would be exceedingly rare. |
||||||
|
|
||||||
|
Interestingly, the large public recursive DNS servers such as provided by |
||||||
|
[Google](https://developers.google.com/speed/public-dns/docs/using), |
||||||
|
[CloudFlare](https://one.one.one.one/), and |
||||||
|
[OpenDNS](https://opendns.com) do not have this feature enabled. That said, |
||||||
|
most DNS products like [BIND](https://www.isc.org/bind/) enable DNS Cookies |
||||||
|
by default. |
||||||
|
|
||||||
|
This feature requires the c-ares channel to persist for the lifetime of the |
||||||
|
application. |
||||||
|
|
||||||
|
|
||||||
|
## TCP FastOpen (0-RTT) |
||||||
|
|
||||||
|
TCP Fast Open is defined in [RFC7413](https://datatracker.ietf.org/doc/html/rfc7413) |
||||||
|
and enables data to be sent with the TCP SYN packet when establishing the |
||||||
|
connection, thus rivaling the performance of UDP. A previous connection must |
||||||
|
have already have been established in order to obtain the client cookie to |
||||||
|
allow the server to trust the data sent in the first packet and know it was not |
||||||
|
an off-path attack. |
||||||
|
|
||||||
|
TCP FastOpen can only be used with indemoptent requests since in timeout |
||||||
|
conditions the SYN packet with data may be re-sent which may cause the server |
||||||
|
to process the packet more than once. Luckily DNS requests are idemoptent. |
||||||
|
|
||||||
|
TCP FastOpen is supported on Linux, MacOS, and FreeBSD. Most other systems do |
||||||
|
not support this feature, or like on Windows require use of completion |
||||||
|
notifications to use it whereas c-ares relies on readiness notifications. |
||||||
|
|
||||||
|
Supported systems also need to be configured appropriately on both the client |
||||||
|
and server systems. |
||||||
|
|
||||||
|
### Linux |
||||||
|
sysctl `net.ipv4.tcp_fastopen`: |
||||||
|
- `1` = client only (typically default) |
||||||
|
- `2` = server only |
||||||
|
- `3` = client and server |
||||||
|
|
||||||
|
### MacOS |
||||||
|
sysctl `net.inet.tcp.fastopen` |
||||||
|
- `1` = client only |
||||||
|
- `2` = server only |
||||||
|
- `3` = client and server (typically default) |
||||||
|
|
||||||
|
### FreeBSD |
||||||
|
sysctl `net.inet.tcp.fastopen.server_enable` (boolean) and |
||||||
|
`net.inet.tcp.fastopen.client_enable` (boolean). |
||||||
|
|
||||||
|
|
||||||
|
## Event Thread |
||||||
|
|
||||||
|
Historic c-ares integrations required integrators to have their own event loop |
||||||
|
which would be required to notify c-ares of read and write events for each |
||||||
|
socket. It was also required to notify c-ares at the appropriate timeout if |
||||||
|
no events had occurred. This could be difficult to do correctly and could |
||||||
|
lead to stalls or other issues. |
||||||
|
|
||||||
|
The Event Thread is currently supported on all systems except DOS which does |
||||||
|
not natively support threading (however it could in theory be possible to |
||||||
|
enable with something like [FSUpthreads](https://arcb.csc.ncsu.edu/~mueller/pthreads/)). |
||||||
|
|
||||||
|
c-ares is built by default with threading support enabled, however it may |
||||||
|
disabled at compile time. The event thread must also be specifically enabled |
||||||
|
via `ARES_OPT_EVENT_THREAD`. |
||||||
|
|
||||||
|
Using the Event Thread feature also facilitates some other features like |
||||||
|
[System Configuration Change Monitoring](#system-configuration-change-monitoring), |
||||||
|
and automatically enables the `ares_set_pending_write_cb()` feature to optimize |
||||||
|
multi-query writing. |
||||||
|
|
||||||
|
|
||||||
|
## System Configuration Change Monitoring |
||||||
|
|
||||||
|
The system configuration is automatically monitored for changes to the network |
||||||
|
and DNS settings. When a change is detected a thread is spawned to read the |
||||||
|
new configuration then apply it to the current c-ares configuration. |
||||||
|
|
||||||
|
This feature requires the [Event Thread](#event-thread) to be enabled via |
||||||
|
`ARES_OPT_EVENT_THREAD`. Otherwise it is up to the integrator to do their own |
||||||
|
configuration monitoring and call `ares_reinit()` to reload the system |
||||||
|
configuration. |
||||||
|
|
||||||
|
It is supported on Windows, MacOS, iOS and any system configuration that uses |
||||||
|
`/etc/resolv.conf` and similar files such as Linux and FreeBSD. Specifically |
||||||
|
excluded are DOS and Android due to missing mechanisms to support such a |
||||||
|
feature. |
||||||
|
|
||||||
|
This feature requires the c-ares channel to persist for the lifetime of the |
||||||
|
application. |
Loading…
Reference in new issue