This PR enables DNS 0x20 as per
https://datatracker.ietf.org/doc/html/draft-vixie-dnsext-dns0x20-00 .
DNS 0x20 adds additional entropy to the request by randomly altering the
case of the DNS question to help prevent cache poisoning attacks.
Google DNS has implemented this support as of 2023, even though this is
a proposed and expired standard from 2008:
https://groups.google.com/g/public-dns-discuss/c/KxIDPOydA5M
There have been documented cases of name server and caching server
non-conformance, though it is expected to become more rare, especially
since Google has started using this.
This can be enabled via the `ARES_FLAG_DNS0x20` flag, which is currently
disabled by default. The test cases do however enable this flag to
validate this feature.
Implementors using this flag will notice that responses will retain the
mixed case, but since DNS names are case-insensitive, any proper
implementation should not be impacted.
There is currently no fallback mechanism implemented as it isn't
immediately clear how this may affect a stub resolver like c-ares where
we aren't querying the authoritative name server, but instead an
intermediate recursive resolver where some domains may return invalid
results while others return valid results, all while querying the same
nameserver. Likely using DNS cookies as suggested by #620 is a better
mechanism to fight cache poisoning attacks for stub resolvers.
TCP queries do not use this feature even if the `ARES_FLAG_DNS0x20` flag
is specified since they are not subject to cache poisoning attacks.
Fixes Issue: #795
Fix By: Brad House (@bradh352)
With very little effort we should be able to determine fairly proper
timeouts we can use based on prior query history. We track in order to
be able to auto-scale when network conditions change (e.g. maybe there
is a provider failover and timings change due to that). Apple appears to
do this within their system resolver in MacOS. Obviously we should have
a minimum, maximum, and initial value to make sure the algorithm doesn't
somehow go off the rails.
Values:
- Minimum Timeout: 250ms (approximate RTT half-way around the globe)
- Maximum Timeout: 5000ms (Recommended timeout in RFC 1123), can be
reduced by ARES_OPT_MAXTIMEOUTMS, but otherwise the bound specified by
the option caps the retry timeout.
- Initial Timeout: User-specified via configuration or
ARES_OPT_TIMEOUTMS
- Average latency multiplier: 5x (a local DNS server returning a cached
value will be quicker than if it needs to recurse so we need to account
for this)
- Minimum Count for Average: 3. This is the minimum number of queries we
need to form an average for the bucket.
Per-server buckets for tracking latency over time (these are ephemeral
meaning they don't persist once a channel is destroyed). We record both
the current timespan for the bucket and the immediate preceding timespan
in case of roll-overs we can still maintain recent metrics for
calculations:
- 1 minute
- 15 minutes
- 1 hr
- 1 day
- since inception
Each bucket contains:
- timestamp (divided by interval)
- minimum latency
- maximum latency
- total time
- count
NOTE: average latency is (total time / count), we will calculate this
dynamically when needed
Basic algorithm for calculating timeout to use would be:
- Scan from most recent bucket to least recent
- Check timestamp of bucket, if doesn't match current time, continue to
next bucket
- Check count of bucket, if its not at least the "Minimum Count for
Average", check the previous bucket, otherwise continue to next bucket
- If we reached the end with no bucket match, use "Initial Timeout"
- If bucket is selected, take ("total time" / count) as Average latency,
multiply by "Average Latency Multiplier", bound by "Minimum Timeout" and
"Maximum Timeout"
NOTE: The timeout calculated may not be the timeout used. If we are
retrying
the query on the same server another time, then it will use a larger
value
On each query reply where the response is legitimate (proper response or
NXDOMAIN) and not something like a server error:
- Cycle through each bucket in order
- Check timestamp of bucket against current timestamp, if out of date
overwrite previous entry with values, clear current values
- Compare current minimum and maximum recorded latency against query
time and adjust if necessary
- Increment "count" by 1 and "total time" by the query time
Other Notes:
- This is always-on, the only user-configurable value is the initial
timeout which will simply re-uses the current option.
- Minimum and Maximum latencies for a bucket are currently unused but
are there in case we find a need for them in the future.
Fixes Issue: #736
Fix By: Brad House (@bradh352)
The query cache should be enabled by default. This will help with
determining proper timeouts for #736. It can still be disabled by
setting the ttl to 0. There should be no negative consequences of this
in real-world scenarios since DNS is based on the TTL concept and
upstream servers will cache results and not recurse based on this
information anyhow. DNS queries and responses are very small, this
should have negligible impact on memory consumption.
Fix By: Brad House (@bradh352)
Automatically detect configuration changes and reload. On systems which
provide notification mechanisms, use those, otherwise fallback to
polling. When a system configuration change is detected, it
asynchronously applies the configuration in order to ensure it is a
non-blocking operation for any queries which may still be being
processed.
On Windows, however, changes aren't detected if a user manually
sets/changes the DNS servers on an interface, it doesn't appear there is
any mechanism capable of this. We are relying on
`NotifyIpInterfaceChange()` for notifications.
Fixes Issue: #613
Fix By: Brad House (@bradh352)
**Summary**
By default c-ares will select the server with the least number of
consecutive failures when sending a query. However, this means that if a
server temporarily goes down and hits failures (e.g. a transient network
issue), then that server will never be retried until all other servers
hit the same number of failures.
This is an issue if the failed server is preferred to other servers in
the list. For example if a primary server and a backup server are
configured.
This PR adds new server failover retry behavior, where failed servers
are retried with small probability after a minimum delay has passed. The
probability and minimum delay are configurable via the
`ARES_OPT_SERVER_FAILOVER` option. By default c-ares will use a
probability of 10% and a minimum delay of 5 seconds.
In addition, this PR includes a small change to always close out
connections to servers which have hit failures, even with
`ARES_FLAG_STAYOPEN`. It's possible that resetting the connection can
resolve some server issues (e.g. by resetting the source port).
**Testing**
A new set of regression tests have been added to test the new server
failover retry behavior.
Fixes Issue: #717
Fix By: Oliver Welsh (@oliverwelsh)
As per Issue #734 some people use `ndots:0` in their configuration which
is allowed by the system resolver but not by c-ares. Add support for
`ndots:0` and add a test case to validate this behavior.
Fixes Issue: #734
Fix By: Brad House (@bradh352)
Hello, I work on an application for Microsoft which uses c-ares to
perform DNS lookups. We have made some minor changes to the library over
time, and would like to contribute these back to the project in case
they are useful more widely. This PR adds a new channel init flag,
described below.
Please let me know if I can include any more information to make this PR
better/easier for you to review. Thanks!
**Summary**
When initializing a channel with `ares_init_options()`, if there are no
nameservers available (because `ARES_OPT_SERVERS` is not used and
`/etc/resolv.conf` is either empty or not available) then a default
local named server will be added to the channel.
However in some applications a local named server will never be
available. In this case, all subsequent queries on the channel will
fail.
If we know this ahead of time, then it may be preferred to fail channel
initialization directly rather than wait for the queries to fail. This
gives better visibility, since we know that the failure is due to
missing servers rather than something going wrong with the queries.
This PR adds a new flag `ARES_FLAG_NO_DFLT_SVR`, to indicate that a
default local named server should not be added to a channel in this
scenario. Instead, a new error `ARES_EINITNOSERVER` is returned and
initialization fails.
**Testing**
I have added 2 new FV tests:
- `ContainerNoDfltSvrEmptyInit` to test that initialization fails when
no nameservers are available and the flag is set.
- `ContainerNoDfltSvrFullInit` to test that initialization still
succeeds when the flag is set but other nameservers are available.
Existing FVs are all passing.
**Documentation**
I have had a go at manually updating the docs to describe the new
flag/error, but couldn't see any contributing guidance about testing
this. Please let me know if you'd like anything more here.
---------
Fix By: Oliver Welsh (@oliverwelsh)
This PR implements an event thread to process all events on file descriptors registered by c-ares. Prior to this feature, integrators were required to understand the internals of c-ares and how to monitor file descriptors and timeouts and process events.
Implements OS-specific efficient polling such as epoll(), kqueue(), or IOCP, and falls back to poll() or select() if otherwise unsupported. At this point, it depends on basic threading primitives such as pthreads or windows threads.
If enabled via the ARES_OPT_EVENT_THREAD option passed to ares_init_options(), then socket callbacks cannot be used.
Fixes Bug: #611
Fix By: Brad House (@bradh352)
c-ares does not have any concept of thread-safety. It has always been 100% up to the implementor to ensure they never call c-ares from more than one thread at a time. This patch adds basic thread-safety support, which can be disabled at compile time if not desired. It uses a single recursive mutex per channel, which should be extremely quick when uncontested so overhead should be minimal.
Fixes Bug: #610
Also sets the stage to implement #611
Fix By: Brad House (@bradh352)
This PR implements a query cache at the lowest possible level, the actual dns request and response messages. Only successful and `NXDOMAIN` responses are cached. The lowest TTL in the response message determines the cache validity period for the response, and is capped at the configuration value for `qcache_max_ttl`. For `NXDOMAIN` responses, the SOA record is evaluated.
For a query to match the cache, the opcode, flags, and each question's class, type, and name are all evaluated. This is to prevent matching a cached entry for a subtly different query (such as if the RD flag is set on one request and not another).
For things like ares_getaddrinfo() or ares_search() that may spawn multiple queries, each individual message received is cached rather than the overarching response. This makes it possible for one query in the sequence to be purged from the cache while others still return cached results which means there is no chance of ever returning stale data.
We have had a lot of user requests to return TTLs on all the various parsers like `ares_parse_caa_reply()`, and likely this is because they want to implement caching mechanisms of their own, thus this PR should solve those issues as well.
Due to the internal data structures we have these days, this PR is less than 500 lines of new code.
Fixes#608
Fix By: Brad House (@bradh352)
The retry timeout values were using a fixed calculation which could cause multiple simultaneous queries to timeout and retry at the exact same time. If a DNS server is throttling requests, this could cause the issue to never self-resolve due to all requests recurring at the same instance again.
This PR also creates a maximum timeout option to make sure the random value selected does not exceed this value.
Fix By: Ignat (@Kontakter)
All DNS servers support EDNS, by using this by default, it will allow larger responses without the need to switch to TCP. If by chance a DNS server is hit that doesn't support EDNS, this is detected due to the lack of the OPT RR in the response and will be automatically retried without EDNS.
Fix By: Brad House (@bradh352)
`ares_channel` is defined as `typedef struct ares_channeldata *ares_channel;`. The problem with this, is it embeds the pointer into the typedef, which means an `ares_channel` can never be declared as `const` as if you write `const ares_channel channel`, that expands to `struct ares_channeldata * const ares_channel` and not `const struct ares_channeldata *channel`.
We will now typedef `ares_channel_t` as `typedef struct ares_channeldata ares_channel_t;`, so if you write `const ares_channel_t *channel`, it properly expands to `const struct ares_channeldata *channel`.
We are maintaining the old typedef for API compatibility with existing integrations, and due to typedef expansion this should not even cause any compiler warnings for existing code. There are no ABI implications with this change. I could be convinced to keep existing public functions as `ares_channel` if a sufficient argument exists, but internally we really need make this change for modern best practices.
This change will allow us to internally use `const ares_channel_t *` where appropriate. Whether or not we decide to change any public interfaces to use `const` may require further discussion on if there might be ABI implications (I don't think so, but I'm also not 100% sure what a compiler internally does with `const` when emitting machine code ... I think more likely ABI implications would occur going the opposite direction).
FYI, This PR was done via a combination of sed and clang-format, the only manual code change was the addition of the new typedef, and a couple doc fixes :)
Fix By: Brad House (@bradh352)
When referring to another c-ares function use \fI function(3) \fP to let
the webpage rendering find and cross-link them appropriately.
SEE ALSO references should be ".BR name (3),", with a space before the
open parenthesis. This helps the manpage to HTML renderer.
Closes#565
Add a new ARES_OPT_UDP_MAX_QUERIES option with udp_max_queries parameter that can be passed to ares_init_options(). This value defaults to 0 (unlimited) to maintain existing compatibility, any positive number will cause new UDP ephemeral ports to be created once the threshold is reached, we'll call these 'connections' even though its technically wrong for UDP.
Implementation Details:
* Each server entry in a channel now has a linked-list of connections/ports for udp and tcp. The first connection in the list is the one most likely to be eligible to accept new queries.
* Queries are now tracked by connection rather than by server.
* Every time a query is detached from a connection, the connection that it was attached to will be checked to see if it needs to be cleaned up.
* Insertion, lookup, and searching for connections has been implemented as O(1) complexity so the number of connections will not impact performance.
* Remove is_broken from the server, it appears it would be set and immediately unset, so must have been invalidated via a prior patch. A future patch should probably track consecutive server errors and de-prioritize such servers. The code right now will always try servers in the order of configuration, so a bad server in the list will always be tried and may rely on timeout logic to try the next.
* Various other cleanups to remove code duplication and for clarification.
Fixes Bug: #444
Fix By: Brad House (@bradh352)
A lot of time has passed since the original timeouts and retry counts were chosen. We have on and off issues reported due to this. Even on geostationary satellite links, latency is worst case around 1.5s. This PR changes the per-server timeout to 2s and the retry count lowered from 4 to 3.
Fix By: Brad House (@bradh352)
All files have their licence and copyright information clearly
identifiable. If not in the file header, they are set separately in
.reuse/dep5.
All used license texts are provided in LICENSES/
As per #487, documentation states the port should be in network byte
order, but we can see from the test cases using MockServers on
different ports that this is not the case, it is definitely in host
byte order.
Fix By: Brad House (@bradh352)
Originally started by Daniel Stenberg (@bagder) with #123, this patch reorganizes the c-ares source tree to have a more modern layout. It also fixes out of tree builds for autotools, and automatically builds the tests if tests are enabled. All tests are passing which tests each of the supported build systems (autotools, cmake, nmake, mingw gmake). There may be some edge cases that will have to be caught later on for things I'm not aware of.
Fix By: Brad House (@bradh352)
The rc4 function iterates over a buffer of size buffer_len who's maximum
value is INT_MAX with a counter of type short that is not guaranteed to
have maximum size INT_MAX.
In circumstances where short is narrower than int and where buffer_len
is larger than the maximum value of a short, it may be possible to loop
infinitely as counter will overflow and never be greater than or equal
to buffer_len.
The solution is to make the comparison be between types of equal width.
This commit defines counter as an int.
Fix By: Fionn Fitzmaurice (@fionn)