Automatically detect configuration changes and reload. On systems which
provide notification mechanisms, use those, otherwise fallback to
polling. When a system configuration change is detected, it
asynchronously applies the configuration in order to ensure it is a
non-blocking operation for any queries which may still be being
processed.
On Windows, however, changes aren't detected if a user manually
sets/changes the DNS servers on an interface, it doesn't appear there is
any mechanism capable of this. We are relying on
`NotifyIpInterfaceChange()` for notifications.
Fixes Issue: #613
Fix By: Brad House (@bradh352)
**Summary**
This PR adds a server state callback that is invoked whenever a query to
a DNS server finishes.
The callback is invoked with the server details (as a string), a boolean
indicating whether the query succeeded or failed, flags describing the
query (currently just indicating whether TCP or UDP was used), and
custom userdata.
This can be used by user applications to gain observability into DNS
server health and usage. For example, alerts when a DNS server
fails/recovers or metrics to track how often a DNS server is used and
responds successfully.
**Testing**
Three new regression tests `MockChannelTest.ServStateCallback*` have
been added to test the new callback in different success/failure
scenarios.
Fix By: Oliver Welsh (@oliverwelsh)
Improve reliability in the server retry delay regression tests by
increasing the retry delay and sleeping for a little more than the retry
delay when attempting to force retries.
This helps to account for unreliable timing (e.g. NTP slew)
intermittently breaking pipelines.
Fix By: Oliver Welsh (@oliverwelsh)
**Summary**
By default c-ares will select the server with the least number of
consecutive failures when sending a query. However, this means that if a
server temporarily goes down and hits failures (e.g. a transient network
issue), then that server will never be retried until all other servers
hit the same number of failures.
This is an issue if the failed server is preferred to other servers in
the list. For example if a primary server and a backup server are
configured.
This PR adds new server failover retry behavior, where failed servers
are retried with small probability after a minimum delay has passed. The
probability and minimum delay are configurable via the
`ARES_OPT_SERVER_FAILOVER` option. By default c-ares will use a
probability of 10% and a minimum delay of 5 seconds.
In addition, this PR includes a small change to always close out
connections to servers which have hit failures, even with
`ARES_FLAG_STAYOPEN`. It's possible that resetting the connection can
resolve some server issues (e.g. by resetting the source port).
**Testing**
A new set of regression tests have been added to test the new server
failover retry behavior.
Fixes Issue: #717
Fix By: Oliver Welsh (@oliverwelsh)
As per Issue #734 some people use `ndots:0` in their configuration which
is allowed by the system resolver but not by c-ares. Add support for
`ndots:0` and add a test case to validate this behavior.
Fixes Issue: #734
Fix By: Brad House (@bradh352)
Multiple functions have been deprecated over the years, annotate them
with attribute deprecated.
When possible show a message about their replacements.
This is a continuation/completion of PR #706
Fix By: Cristian Rodríguez (@crrodriguez)
This PR adds a new function `ares_search_dnsrec()` to search for records
using the new DNS record parser.
The function takes an arbitrary DNS record object to search (that must
represent a query for a single name). The function takes a new callback
type, `ares_callback_dnsrec`, that is invoked with a parsed DNS record
object rather than the raw buffer(+length).
The original motivation for this change is to provide support for
[draft-kaplan-enum-sip-routing-04](https://datatracker.ietf.org/doc/html/draft-kaplan-enum-sip-routing-04);
when routing phone calls using an ENUM server, it can be useful to
include identifying source information in an OPT RR options value, to
help select the appropriate route for the call. The new function allows
for more customisable searches like this.
**Summary of code changes**
A new function `ares_search_dnsrec()` has been added and exposed.
Moreover, the entire `ares_search_int()` internal code flow has been
refactored to use parsed DNS record objects and the new DNS record
parser. The DNS record object is passed through the `search_query`
structure by encoding/decoding to/from a buffer (if multiple search
domains are used). A helper function `ares_dns_write_query_altname()` is
used to re-write the DNS record object with a new query name (used to
append search domains).
`ares_search()` is now a wrapper around the new internal code, where the
DNS record object is created based on the name, class and type
parameters.
The new function uses a new callback type, `ares_callback_dnsrec`. This
is invoked with a parsed DNS record object. For now, we convert from
`ares_callback` to this new type using `ares__dnsrec_convert_cb()`.
Some functions that are common to both `ares_query()` and
`ares_search()` have been refactored using the new DNS record parser.
See `ares_dns_record_create_query()` and
`ares_dns_query_reply_tostatus()`.
**Testing**
A new FV has been added to test the new function, which searches for a
DNS record containing an OPT RR with custom options value.
As part of this, I needed to enhance the mock DNS server to expect
request text (and assert that it matches actual request text). This is
because the FV needs to check that the request contains the correct OPT
RR.
**Documentation**
The man page docs have been updated to describe the new feature.
**Futures**
In the future, a new variant of `ares_send()` could be introduced in the
same vein (`ares_send_dnsrec()`). This could be used by
`ares_search_dnsrec()`. Moreover, we could migrate internal code to use
`ares_callback_dnsrec` as the default callback.
This will help to make the new DNS record parser the norm in C-Ares.
---------
Co-authored-by: Oliver Welsh (@oliverwelsh)
Rewrite configuration parsers using new memory safe parsing functions.
After CVE-2024-25629 its obvious that we need to prioritize again on
getting all the hand written parsers with direct pointer manipulation
replaced. They're just not safe and hard to audit. It was yet another
example of 20+yr old code having a memory safety issue just now coming
to light.
Though these parsers are definitely less efficient, they're written with
memory safety in mind, and any performance difference is going to be
meaningless for something that only happens once a while.
Fix By: Brad House (@bradh352)
Hello, I work on an application for Microsoft which uses c-ares to
perform DNS lookups. We have made some minor changes to the library over
time, and would like to contribute these back to the project in case
they are useful more widely. This PR adds a new channel init flag,
described below.
Please let me know if I can include any more information to make this PR
better/easier for you to review. Thanks!
**Summary**
When initializing a channel with `ares_init_options()`, if there are no
nameservers available (because `ARES_OPT_SERVERS` is not used and
`/etc/resolv.conf` is either empty or not available) then a default
local named server will be added to the channel.
However in some applications a local named server will never be
available. In this case, all subsequent queries on the channel will
fail.
If we know this ahead of time, then it may be preferred to fail channel
initialization directly rather than wait for the queries to fail. This
gives better visibility, since we know that the failure is due to
missing servers rather than something going wrong with the queries.
This PR adds a new flag `ARES_FLAG_NO_DFLT_SVR`, to indicate that a
default local named server should not be added to a channel in this
scenario. Instead, a new error `ARES_EINITNOSERVER` is returned and
initialization fails.
**Testing**
I have added 2 new FV tests:
- `ContainerNoDfltSvrEmptyInit` to test that initialization fails when
no nameservers are available and the flag is set.
- `ContainerNoDfltSvrFullInit` to test that initialization still
succeeds when the flag is set but other nameservers are available.
Existing FVs are all passing.
**Documentation**
I have had a go at manually updating the docs to describe the new
flag/error, but couldn't see any contributing guidance about testing
this. Please let me know if you'd like anything more here.
---------
Fix By: Oliver Welsh (@oliverwelsh)
It may be useful to wait for the queue to be empty under certain conditions (mainly test cases), expose a function to efficiently do this and rework test cases to use it.
Fix By: Brad House (@bradh352)
Regression introduced in 1.26.0, building c-ares with threading disabled results in ares_init{_options}() failing.
Also adds a new CI test case to prevent this regression in the future.
Fixes Bug: #699
Fix By: Brad House (@bradh352)
This PR implements an event thread to process all events on file descriptors registered by c-ares. Prior to this feature, integrators were required to understand the internals of c-ares and how to monitor file descriptors and timeouts and process events.
Implements OS-specific efficient polling such as epoll(), kqueue(), or IOCP, and falls back to poll() or select() if otherwise unsupported. At this point, it depends on basic threading primitives such as pthreads or windows threads.
If enabled via the ARES_OPT_EVENT_THREAD option passed to ares_init_options(), then socket callbacks cannot be used.
Fixes Bug: #611
Fix By: Brad House (@bradh352)
This pull request adds six flags to instruct the parser under various circumstances to skip parsing of the returned RR records so the raw data can be retrieved.
Fixes Bug: #686
Fix By: Erik Lax (@eriklax)
The previous build system allowed overwriting of CFLAGS/CPPFLAGS/CXXFLAGS on the make command line. Switch to using AM_CFLAGS/AM_CPPFLAGS/AM_CXXFLAGS when we set our own flags for building which ensures they are kept even when a user tries to override.
Fixes Bug: #694
Fix By: Brad House (@bradh352)
It appears as though we should never sanity check the RR name vs the question name as some DNS servers may return results for alias records.
Fixes Bug: #683
Fix By: Brad House (@bradh352)
* cmake: avoid warning about non-existing include dir
In the Debian build logs I noticed the following warning:
cc1: warning: /build/c-ares-1.25.0/test/include: No such file or directory [-Wmissing-include-dirs]
This happened because ${CMAKE_INSTALL_INCLUDEDIR} had been added to
caresinternal. I believe it has been copied from the "real" lib
where it's used in the INSTALL_INTERFACE context. But because
caresinternal is never installed we don't need that include here.
* cmake: drop CARES_TOPLEVEL_DIR variable
The CARES_TOPLEVEL_DIR variable is the same as the automatically
created PROJECT_SOURCE_DIR variable. Let's stick to the official
one. Also because it is already used at places where CARES_TOPLEVEL_DIR
is used as well.
Fix By: Gregor Jasny (@gjasny)
Completely rework the autotools build system, issues have cropped up due to the complexity and could cause issues on even semi-modern Linux systems (Ubuntu 20.04 for example).
Changes include:
Remove all curl/xc/cares m4 helper files, they go overboard on detections of functions and datatypes. Go back to more plain autoconf macros as they've come a long way over the years.
Use known systems and heuristics to determine datatypes for functions like send() and recv(), rather than the error prone detection which required thousands of permutations and might still get it wrong.
Remove unneeded configure arguments like --enable-debug or --enable-optimize, its more common for people to simply pass their own CFLAGS on the command line.
Only require CARES_STATICLIB definition on Windows static builds, its not necessary ever for other systems, even when hiding non-public symbols.
Remove some function and definition detections that were never used in c-ares
The test framework is now embedded into the toplevel configure system, there was no need to chain build the test system as it is never built externally to c-ares.
As a side-effect of the changes, a configure run completes in about 25% of the original time.
This has been tested on various Linux distributions (of varying age), FreeBSD, MacOS, Windows (via MSYS2 with Mingw), and Solaris10/11 (by @dfandrich), AIX 7.3 (by @dfandrich). It is not unlikely that this may have broken more esoteric or legacy systems, and we'll likely need to be ready to accept bug reports and patches, but it has removed over 10k lines of build system code. It is very likely any issues that crop up will add far fewer lines of code to fix such systems.
Fixes Bug: #670
Fix By: Brad House (@bradh352)
* get rid of clashes with curl namespace
* remove warnings due to deprecated functionality
* reorder some macro calls to get rid of warnings due to being called in the wrong order
Fix By: Brad House (@bradh352)
New test cases depend on internal symbols for calculating timeouts.
Disable those test features if symbol hiding is enabled.
Fixes Bug: #664
Fix By: Brad House (@bradh352)
When doing ares_gethostbyname() or ares_getaddrinfo() with AF_UNSPEC, if ares_cancel() was called after one address class was returned but before the other address class, it would return ARES_SUCCESS rather than ARES_ECANCELLED.
Test case has been added for this specific condition.
Fixes Bug: #662
Fix By: Brad House (@bradh352)
GoogleTest should be unbundled. Google changed their guidance a few years back and modern versions of google test cannot build the bundling code file.
This PR also updates to use C++14 as is required by modern GoogleTest versions.
Fixes Bug: #506
Fix By: Brad House (@bradh352)
The parser for the sortlist has been rewritten to use the ares__buf_*() functions. This also resolves some known bugs in accepting invalid sortlist entries which should have caused parse failures.
Fixes Bug: #501
Fix By: Brad House (@bradh352)
As of c-ares 1.22.0, server timeouts were erroneously not incrementing server failures meaning the server in use wouldn't rotate. There was apparently never a test case for this condition.
This PR fixes the bug and adds a test case to ensure it behaves properly.
Fixes Bug: #650
Fix By: Brad House (@bradh352)
Some environments may send router advertisements on a link setting their link-local (fe80::/10) address as a valid DNS server to the remote system. This will cause a DNS entry to be created like `fe80::1%iface`, since all link-local network interfaces are technically part of the same /10 subnet, it must be told what interface to send packets through explicitly if there are multiple physical interfaces.
This PR adds support for the %iface modifier when setting DNS servers via `/etc/resolv.conf` as well as via `ares_set_servers_csv()`.
For MacOS and iOS it is assumed that libresolve will set the `sin6_scope_id` and should be supported, but my test systems don't seem to read the Router Advertisement for RDNSS link-local. Specifying the link-local dns server on MacOS via adig has been tested and confirmed working.
For Windows, this is similar to MacOS in that the system doesn't seem to honor the RDNSS RA, but specifying manually has been tested to work.
At this point, Android support does not exist.
Fixes Bug #462
Supersedes PR #463
Fix By: Brad House (@bradh352) and Serhii Purik (@sergvpurik)
Regression from c-ares 1.19.1, ARES_OPT_UDP_PORT and ARES_OPT_TCP_PORT are
specified from the user in host-byte order, but there was a regression that
caused it to be read as if it was network byte order.
Fixes Bug: #640
Reported By: @Flow86
Fix By: Brad House (@bradh352)
c-ares does not have any concept of thread-safety. It has always been 100% up to the implementor to ensure they never call c-ares from more than one thread at a time. This patch adds basic thread-safety support, which can be disabled at compile time if not desired. It uses a single recursive mutex per channel, which should be extremely quick when uncontested so overhead should be minimal.
Fixes Bug: #610
Also sets the stage to implement #611
Fix By: Brad House (@bradh352)
c-ares parses only antique version of options for timeout and number of retries from resolv.conf (`retrans` and `retry` are missing in modern documentation https://man7.org/linux/man-pages/man5/resolv.conf.5.html).
I add support of `attempts` and `timeout` options
Fix By: Ignat (@Kontakter)
When building for UWP (WindowsStore), additional headers are needed and some functions are not available. This also adds AppVeyor CI/CD support to catch these issues in the future.
Fix By: Deal (@halx99) and Brad House (@bradh352)