The main purpose of this PR is to modularize the communication library,
and streamline the code paths for TCP and UDP into a single flow. This
will help us add additional flows such as TLS, and also make sure these
communication functions return a known set of error codes so that
additional error codes can be added for new flows in the future.
It also adds a new optional callback of
`ares_set_notify_pending_write_callback()` that will assist in
aggregating data from multiple queries into a single write operation.
This doesn't apply to UDP, but on TCP and especially TLS in the future
this can be a significant win. This is automatically applied for the
Event Thread.
It also fixes a long standing issue if UDP connection became saturated
and started returning `EWOULDBLOCK` or `EAGAIN` it was treated as a
failure. Since this is more inline with the TCP code now, it will wait
on a write event to retry.
Finally there are additional cleanups due to not needing to be able to
retrieve socket errnos all over the place.
Authored-By: Brad House (@bradh352)
DNS cookies are a simple form of learned mutual authentication supported
by most DNS server implementations these days and can help prevent DNS
Cache Poisoning attacks for clients and DNS amplification attacks for
servers.
Fixes#620
Fix By: Brad House (@bradh352)
As per #738, there are usecases where the DNS TXT record strings should
not be concatenated like RFC 7208 indicates. We cannot break ABI with
those using the new API, so we need to support retrieving the
concatenated version as well as a new API to retrieve the individual
strings which will be used by `ares_parse_text_reply_ext()` to restore
the old behavior prior to c-ares 1.20.
Fixes Issue: #738
Fix By: Brad House (@bradh352)
This PR enables DNS 0x20 as per
https://datatracker.ietf.org/doc/html/draft-vixie-dnsext-dns0x20-00 .
DNS 0x20 adds additional entropy to the request by randomly altering the
case of the DNS question to help prevent cache poisoning attacks.
Google DNS has implemented this support as of 2023, even though this is
a proposed and expired standard from 2008:
https://groups.google.com/g/public-dns-discuss/c/KxIDPOydA5M
There have been documented cases of name server and caching server
non-conformance, though it is expected to become more rare, especially
since Google has started using this.
This can be enabled via the `ARES_FLAG_DNS0x20` flag, which is currently
disabled by default. The test cases do however enable this flag to
validate this feature.
Implementors using this flag will notice that responses will retain the
mixed case, but since DNS names are case-insensitive, any proper
implementation should not be impacted.
There is currently no fallback mechanism implemented as it isn't
immediately clear how this may affect a stub resolver like c-ares where
we aren't querying the authoritative name server, but instead an
intermediate recursive resolver where some domains may return invalid
results while others return valid results, all while querying the same
nameserver. Likely using DNS cookies as suggested by #620 is a better
mechanism to fight cache poisoning attacks for stub resolvers.
TCP queries do not use this feature even if the `ARES_FLAG_DNS0x20` flag
is specified since they are not subject to cache poisoning attacks.
Fixes Issue: #795
Fix By: Brad House (@bradh352)
Watt-32 (https://www.watt-32.net/) support has been broken for a long
time. Patch c-ares to fix Watt-32 support and also gets rid of the
`WIN32` macro which adds confusion, only use `USE_WINSOCK` macro.
Add a CI/CD task to build c-ares on Windows using MSVC with Watt-32.
Fixes Issue: #780
Fix By: Brad House (@bradh352)
With the current c-ares parser, as per PR #765 parsing was broken due to
validation that didn't understand the `SIG` record class. This PR adds
basic, non validating, and incomplete support for the `SIG` record type.
The additional `KEY` and `NXT` which would be required for additional
verification of the records is not implemented. It also does not store
the raw unprocessed RR data that would be required for the validation.
The primary purpose of this PR is to be able to recognize the record and
handle some periphery aspects such as validation of the class associated
with the RR and to not honor the TTL in the RR in the c-ares query cache
since it will always be 0.
Fixes#765
Fix By: Brad House (@bradh352)
at https://github.com/c-ares/c-ares/pull/601#issuecomment-1801935063 you
chose not to scatter `const` on the public interface because of the plan
- now realised - to add threading to c-ares, and in the expectation that
even read operations would need to lock the mutex.
But the threading implementation has a _pointer_ to a mutex inside the
ares channel and as I understand it, that means that it is just fine to
mark `ares__channel_lock` (and `ares__channel_unlock`) as taking a
`const` channel. It is the pointed-to mutex that is not constant, but C
does not propagate `const`-ness through pointers.
This PR sprinkles const where appropriate on public interfaces.
Fix By: David Hotham (@dimbleby)
**Summary**
This PR adds a server state callback that is invoked whenever a query to
a DNS server finishes.
The callback is invoked with the server details (as a string), a boolean
indicating whether the query succeeded or failed, flags describing the
query (currently just indicating whether TCP or UDP was used), and
custom userdata.
This can be used by user applications to gain observability into DNS
server health and usage. For example, alerts when a DNS server
fails/recovers or metrics to track how often a DNS server is used and
responds successfully.
**Testing**
Three new regression tests `MockChannelTest.ServStateCallback*` have
been added to test the new callback in different success/failure
scenarios.
Fix By: Oliver Welsh (@oliverwelsh)
**Summary**
By default c-ares will select the server with the least number of
consecutive failures when sending a query. However, this means that if a
server temporarily goes down and hits failures (e.g. a transient network
issue), then that server will never be retried until all other servers
hit the same number of failures.
This is an issue if the failed server is preferred to other servers in
the list. For example if a primary server and a backup server are
configured.
This PR adds new server failover retry behavior, where failed servers
are retried with small probability after a minimum delay has passed. The
probability and minimum delay are configurable via the
`ARES_OPT_SERVER_FAILOVER` option. By default c-ares will use a
probability of 10% and a minimum delay of 5 seconds.
In addition, this PR includes a small change to always close out
connections to servers which have hit failures, even with
`ARES_FLAG_STAYOPEN`. It's possible that resetting the connection can
resolve some server issues (e.g. by resetting the source port).
**Testing**
A new set of regression tests have been added to test the new server
failover retry behavior.
Fixes Issue: #717
Fix By: Oliver Welsh (@oliverwelsh)
Multiple functions have been deprecated over the years, annotate them
with attribute deprecated.
When possible show a message about their replacements.
This is a continuation/completion of PR #706
Fix By: Cristian Rodríguez (@crrodriguez)
c-ares has historically passed around raw dns packets in binary form.
Now that we have a new parser, and messages are already parsed
internally, lets pass around that parsed message rather than requiring
multiple parse attempts on the same message. Also add a new
`ares_send_dnsrec()` and `ares_query_dnsrec()` similar to
`ares_search_dnsrec()` added with PR #719 that can return the pointer to
the `ares_dns_record_t` to the caller enqueuing queries and rework
`ares_search_dnsrec()` to use `ares_send_dnsrec()` internally.
Fix By: Brad House (@bradh352)
This PR adds a new function `ares_search_dnsrec()` to search for records
using the new DNS record parser.
The function takes an arbitrary DNS record object to search (that must
represent a query for a single name). The function takes a new callback
type, `ares_callback_dnsrec`, that is invoked with a parsed DNS record
object rather than the raw buffer(+length).
The original motivation for this change is to provide support for
[draft-kaplan-enum-sip-routing-04](https://datatracker.ietf.org/doc/html/draft-kaplan-enum-sip-routing-04);
when routing phone calls using an ENUM server, it can be useful to
include identifying source information in an OPT RR options value, to
help select the appropriate route for the call. The new function allows
for more customisable searches like this.
**Summary of code changes**
A new function `ares_search_dnsrec()` has been added and exposed.
Moreover, the entire `ares_search_int()` internal code flow has been
refactored to use parsed DNS record objects and the new DNS record
parser. The DNS record object is passed through the `search_query`
structure by encoding/decoding to/from a buffer (if multiple search
domains are used). A helper function `ares_dns_write_query_altname()` is
used to re-write the DNS record object with a new query name (used to
append search domains).
`ares_search()` is now a wrapper around the new internal code, where the
DNS record object is created based on the name, class and type
parameters.
The new function uses a new callback type, `ares_callback_dnsrec`. This
is invoked with a parsed DNS record object. For now, we convert from
`ares_callback` to this new type using `ares__dnsrec_convert_cb()`.
Some functions that are common to both `ares_query()` and
`ares_search()` have been refactored using the new DNS record parser.
See `ares_dns_record_create_query()` and
`ares_dns_query_reply_tostatus()`.
**Testing**
A new FV has been added to test the new function, which searches for a
DNS record containing an OPT RR with custom options value.
As part of this, I needed to enhance the mock DNS server to expect
request text (and assert that it matches actual request text). This is
because the FV needs to check that the request contains the correct OPT
RR.
**Documentation**
The man page docs have been updated to describe the new feature.
**Futures**
In the future, a new variant of `ares_send()` could be introduced in the
same vein (`ares_send_dnsrec()`). This could be used by
`ares_search_dnsrec()`. Moreover, we could migrate internal code to use
`ares_callback_dnsrec` as the default callback.
This will help to make the new DNS record parser the norm in C-Ares.
---------
Co-authored-by: Oliver Welsh (@oliverwelsh)
Hello, I work on an application for Microsoft which uses c-ares to
perform DNS lookups. We have made some minor changes to the library over
time, and would like to contribute these back to the project in case
they are useful more widely. This PR adds a new channel init flag,
described below.
Please let me know if I can include any more information to make this PR
better/easier for you to review. Thanks!
**Summary**
When initializing a channel with `ares_init_options()`, if there are no
nameservers available (because `ARES_OPT_SERVERS` is not used and
`/etc/resolv.conf` is either empty or not available) then a default
local named server will be added to the channel.
However in some applications a local named server will never be
available. In this case, all subsequent queries on the channel will
fail.
If we know this ahead of time, then it may be preferred to fail channel
initialization directly rather than wait for the queries to fail. This
gives better visibility, since we know that the failure is due to
missing servers rather than something going wrong with the queries.
This PR adds a new flag `ARES_FLAG_NO_DFLT_SVR`, to indicate that a
default local named server should not be added to a channel in this
scenario. Instead, a new error `ARES_EINITNOSERVER` is returned and
initialization fails.
**Testing**
I have added 2 new FV tests:
- `ContainerNoDfltSvrEmptyInit` to test that initialization fails when
no nameservers are available and the flag is set.
- `ContainerNoDfltSvrFullInit` to test that initialization still
succeeds when the flag is set but other nameservers are available.
Existing FVs are all passing.
**Documentation**
I have had a go at manually updating the docs to describe the new
flag/error, but couldn't see any contributing guidance about testing
this. Please let me know if you'd like anything more here.
---------
Fix By: Oliver Welsh (@oliverwelsh)
Add a function to request the number of active queries from an ares
channel. This will return the number of inflight requests to dns
servers. Some functions like `ares_getaddrinfo()` when using `AF_UNSPEC`
may enqueue multiple queries which will be reflected in this count.
In the future, if we implement support for queuing (e.g. for throttling
purposes), and/or implement support for tracking user-requested queries
(e.g. for cancelation), we can provide additional functions for
inspecting those queues.
Fix By: Brad House (@bradh352)
It may be useful to wait for the queue to be empty under certain conditions (mainly test cases), expose a function to efficiently do this and rework test cases to use it.
Fix By: Brad House (@bradh352)
This PR implements an event thread to process all events on file descriptors registered by c-ares. Prior to this feature, integrators were required to understand the internals of c-ares and how to monitor file descriptors and timeouts and process events.
Implements OS-specific efficient polling such as epoll(), kqueue(), or IOCP, and falls back to poll() or select() if otherwise unsupported. At this point, it depends on basic threading primitives such as pthreads or windows threads.
If enabled via the ARES_OPT_EVENT_THREAD option passed to ares_init_options(), then socket callbacks cannot be used.
Fixes Bug: #611
Fix By: Brad House (@bradh352)
This pull request adds six flags to instruct the parser under various circumstances to skip parsing of the returned RR records so the raw data can be retrieved.
Fixes Bug: #686
Fix By: Erik Lax (@eriklax)
External integrations don't need sys/random.h in order to compile, remove the dependency. Try to fix building on legacy MacOS versions.
Fixes Issue: #682
Fix By: Brad House (@bradh352)
Completely rework the autotools build system, issues have cropped up due to the complexity and could cause issues on even semi-modern Linux systems (Ubuntu 20.04 for example).
Changes include:
Remove all curl/xc/cares m4 helper files, they go overboard on detections of functions and datatypes. Go back to more plain autoconf macros as they've come a long way over the years.
Use known systems and heuristics to determine datatypes for functions like send() and recv(), rather than the error prone detection which required thousands of permutations and might still get it wrong.
Remove unneeded configure arguments like --enable-debug or --enable-optimize, its more common for people to simply pass their own CFLAGS on the command line.
Only require CARES_STATICLIB definition on Windows static builds, its not necessary ever for other systems, even when hiding non-public symbols.
Remove some function and definition detections that were never used in c-ares
The test framework is now embedded into the toplevel configure system, there was no need to chain build the test system as it is never built externally to c-ares.
As a side-effect of the changes, a configure run completes in about 25% of the original time.
This has been tested on various Linux distributions (of varying age), FreeBSD, MacOS, Windows (via MSYS2 with Mingw), and Solaris10/11 (by @dfandrich), AIX 7.3 (by @dfandrich). It is not unlikely that this may have broken more esoteric or legacy systems, and we'll likely need to be ready to accept bug reports and patches, but it has removed over 10k lines of build system code. It is very likely any issues that crop up will add far fewer lines of code to fix such systems.
Fixes Bug: #670
Fix By: Brad House (@bradh352)
Some environments may send router advertisements on a link setting their link-local (fe80::/10) address as a valid DNS server to the remote system. This will cause a DNS entry to be created like `fe80::1%iface`, since all link-local network interfaces are technically part of the same /10 subnet, it must be told what interface to send packets through explicitly if there are multiple physical interfaces.
This PR adds support for the %iface modifier when setting DNS servers via `/etc/resolv.conf` as well as via `ares_set_servers_csv()`.
For MacOS and iOS it is assumed that libresolve will set the `sin6_scope_id` and should be supported, but my test systems don't seem to read the Router Advertisement for RDNSS link-local. Specifying the link-local dns server on MacOS via adig has been tested and confirmed working.
For Windows, this is similar to MacOS in that the system doesn't seem to honor the RDNSS RA, but specifying manually has been tested to work.
At this point, Android support does not exist.
Fixes Bug #462
Supersedes PR #463
Fix By: Brad House (@bradh352) and Serhii Purik (@sergvpurik)