We've had reports of user-after-free type crashes in Windows cleanup
code for the Event Thread. In evaluating the code, it appeared there
were some memory leaks on per-connection handles that may have remained
open during shutdown, while trying to resolve that it became apparent
the methodology chosen may not have been the right one for interfacing
with the Windows AFD system as stability issues were seen during this
debugging process.
Since this system is completely undocumented, there was no clear
resolution path other than to switch to the *other* methodology which
involves directly opening `\Device\Afd`, rather than spawning a "peer
socket" to use to queue AFD operations.
The original methodology chosen more closely resembled what is employed
by [libuv](https://github.com/libuv/libuv) and given its widespread use
was the reason it was used. The new methodology more closely resembles
[wepoll](https://github.com/piscisaureus/wepoll).
Its not clear if there are any scalability or performance advantages or
disadvantages for either method. They both seem like different ways to
do the same thing, but this current way does seem more stable.
Fixes#798
Fix By: Brad House (@bradh352)
Hello, I work on an application for Microsoft which uses c-ares to
perform DNS lookups. We have made some minor changes to the library over
time, and would like to contribute these back to the project in case
they are useful more widely. This PR adds a new channel init flag,
described below.
Please let me know if I can include any more information to make this PR
better/easier for you to review. Thanks!
**Summary**
When initializing a channel with `ares_init_options()`, if there are no
nameservers available (because `ARES_OPT_SERVERS` is not used and
`/etc/resolv.conf` is either empty or not available) then a default
local named server will be added to the channel.
However in some applications a local named server will never be
available. In this case, all subsequent queries on the channel will
fail.
If we know this ahead of time, then it may be preferred to fail channel
initialization directly rather than wait for the queries to fail. This
gives better visibility, since we know that the failure is due to
missing servers rather than something going wrong with the queries.
This PR adds a new flag `ARES_FLAG_NO_DFLT_SVR`, to indicate that a
default local named server should not be added to a channel in this
scenario. Instead, a new error `ARES_EINITNOSERVER` is returned and
initialization fails.
**Testing**
I have added 2 new FV tests:
- `ContainerNoDfltSvrEmptyInit` to test that initialization fails when
no nameservers are available and the flag is set.
- `ContainerNoDfltSvrFullInit` to test that initialization still
succeeds when the flag is set but other nameservers are available.
Existing FVs are all passing.
**Documentation**
I have had a go at manually updating the docs to describe the new
flag/error, but couldn't see any contributing guidance about testing
this. Please let me know if you'd like anything more here.
---------
Fix By: Oliver Welsh (@oliverwelsh)
It may be useful to wait for the queue to be empty under certain conditions (mainly test cases), expose a function to efficiently do this and rework test cases to use it.
Fix By: Brad House (@bradh352)
Regression introduced in 1.26.0, building c-ares with threading disabled results in ares_init{_options}() failing.
Also adds a new CI test case to prevent this regression in the future.
Fixes Bug: #699
Fix By: Brad House (@bradh352)
This PR implements an event thread to process all events on file descriptors registered by c-ares. Prior to this feature, integrators were required to understand the internals of c-ares and how to monitor file descriptors and timeouts and process events.
Implements OS-specific efficient polling such as epoll(), kqueue(), or IOCP, and falls back to poll() or select() if otherwise unsupported. At this point, it depends on basic threading primitives such as pthreads or windows threads.
If enabled via the ARES_OPT_EVENT_THREAD option passed to ares_init_options(), then socket callbacks cannot be used.
Fixes Bug: #611
Fix By: Brad House (@bradh352)
This pull request adds six flags to instruct the parser under various circumstances to skip parsing of the returned RR records so the raw data can be retrieved.
Fixes Bug: #686
Fix By: Erik Lax (@eriklax)
The previous build system allowed overwriting of CFLAGS/CPPFLAGS/CXXFLAGS on the make command line. Switch to using AM_CFLAGS/AM_CPPFLAGS/AM_CXXFLAGS when we set our own flags for building which ensures they are kept even when a user tries to override.
Fixes Bug: #694
Fix By: Brad House (@bradh352)
It appears as though we should never sanity check the RR name vs the question name as some DNS servers may return results for alias records.
Fixes Bug: #683
Fix By: Brad House (@bradh352)
* cmake: avoid warning about non-existing include dir
In the Debian build logs I noticed the following warning:
cc1: warning: /build/c-ares-1.25.0/test/include: No such file or directory [-Wmissing-include-dirs]
This happened because ${CMAKE_INSTALL_INCLUDEDIR} had been added to
caresinternal. I believe it has been copied from the "real" lib
where it's used in the INSTALL_INTERFACE context. But because
caresinternal is never installed we don't need that include here.
* cmake: drop CARES_TOPLEVEL_DIR variable
The CARES_TOPLEVEL_DIR variable is the same as the automatically
created PROJECT_SOURCE_DIR variable. Let's stick to the official
one. Also because it is already used at places where CARES_TOPLEVEL_DIR
is used as well.
Fix By: Gregor Jasny (@gjasny)
Completely rework the autotools build system, issues have cropped up due to the complexity and could cause issues on even semi-modern Linux systems (Ubuntu 20.04 for example).
Changes include:
Remove all curl/xc/cares m4 helper files, they go overboard on detections of functions and datatypes. Go back to more plain autoconf macros as they've come a long way over the years.
Use known systems and heuristics to determine datatypes for functions like send() and recv(), rather than the error prone detection which required thousands of permutations and might still get it wrong.
Remove unneeded configure arguments like --enable-debug or --enable-optimize, its more common for people to simply pass their own CFLAGS on the command line.
Only require CARES_STATICLIB definition on Windows static builds, its not necessary ever for other systems, even when hiding non-public symbols.
Remove some function and definition detections that were never used in c-ares
The test framework is now embedded into the toplevel configure system, there was no need to chain build the test system as it is never built externally to c-ares.
As a side-effect of the changes, a configure run completes in about 25% of the original time.
This has been tested on various Linux distributions (of varying age), FreeBSD, MacOS, Windows (via MSYS2 with Mingw), and Solaris10/11 (by @dfandrich), AIX 7.3 (by @dfandrich). It is not unlikely that this may have broken more esoteric or legacy systems, and we'll likely need to be ready to accept bug reports and patches, but it has removed over 10k lines of build system code. It is very likely any issues that crop up will add far fewer lines of code to fix such systems.
Fixes Bug: #670
Fix By: Brad House (@bradh352)
* get rid of clashes with curl namespace
* remove warnings due to deprecated functionality
* reorder some macro calls to get rid of warnings due to being called in the wrong order
Fix By: Brad House (@bradh352)
New test cases depend on internal symbols for calculating timeouts.
Disable those test features if symbol hiding is enabled.
Fixes Bug: #664
Fix By: Brad House (@bradh352)
When doing ares_gethostbyname() or ares_getaddrinfo() with AF_UNSPEC, if ares_cancel() was called after one address class was returned but before the other address class, it would return ARES_SUCCESS rather than ARES_ECANCELLED.
Test case has been added for this specific condition.
Fixes Bug: #662
Fix By: Brad House (@bradh352)
GoogleTest should be unbundled. Google changed their guidance a few years back and modern versions of google test cannot build the bundling code file.
This PR also updates to use C++14 as is required by modern GoogleTest versions.
Fixes Bug: #506
Fix By: Brad House (@bradh352)
The parser for the sortlist has been rewritten to use the ares__buf_*() functions. This also resolves some known bugs in accepting invalid sortlist entries which should have caused parse failures.
Fixes Bug: #501
Fix By: Brad House (@bradh352)
As of c-ares 1.22.0, server timeouts were erroneously not incrementing server failures meaning the server in use wouldn't rotate. There was apparently never a test case for this condition.
This PR fixes the bug and adds a test case to ensure it behaves properly.
Fixes Bug: #650
Fix By: Brad House (@bradh352)
Some environments may send router advertisements on a link setting their link-local (fe80::/10) address as a valid DNS server to the remote system. This will cause a DNS entry to be created like `fe80::1%iface`, since all link-local network interfaces are technically part of the same /10 subnet, it must be told what interface to send packets through explicitly if there are multiple physical interfaces.
This PR adds support for the %iface modifier when setting DNS servers via `/etc/resolv.conf` as well as via `ares_set_servers_csv()`.
For MacOS and iOS it is assumed that libresolve will set the `sin6_scope_id` and should be supported, but my test systems don't seem to read the Router Advertisement for RDNSS link-local. Specifying the link-local dns server on MacOS via adig has been tested and confirmed working.
For Windows, this is similar to MacOS in that the system doesn't seem to honor the RDNSS RA, but specifying manually has been tested to work.
At this point, Android support does not exist.
Fixes Bug #462
Supersedes PR #463
Fix By: Brad House (@bradh352) and Serhii Purik (@sergvpurik)
Regression from c-ares 1.19.1, ARES_OPT_UDP_PORT and ARES_OPT_TCP_PORT are
specified from the user in host-byte order, but there was a regression that
caused it to be read as if it was network byte order.
Fixes Bug: #640
Reported By: @Flow86
Fix By: Brad House (@bradh352)
c-ares does not have any concept of thread-safety. It has always been 100% up to the implementor to ensure they never call c-ares from more than one thread at a time. This patch adds basic thread-safety support, which can be disabled at compile time if not desired. It uses a single recursive mutex per channel, which should be extremely quick when uncontested so overhead should be minimal.
Fixes Bug: #610
Also sets the stage to implement #611
Fix By: Brad House (@bradh352)
c-ares parses only antique version of options for timeout and number of retries from resolv.conf (`retrans` and `retry` are missing in modern documentation https://man7.org/linux/man-pages/man5/resolv.conf.5.html).
I add support of `attempts` and `timeout` options
Fix By: Ignat (@Kontakter)
When building for UWP (WindowsStore), additional headers are needed and some functions are not available. This also adds AppVeyor CI/CD support to catch these issues in the future.
Fix By: Deal (@halx99) and Brad House (@bradh352)
For historic reasons, we have users depending on ares_set_servers_*()
to return ARES_SUCCESS when passing no servers and actually *clear*
the server list. It appears they do this for test cases to simulate
DNS unavailable or similar. Presumably they could achieve the same
effect in other ways (point to localhost on a port that isn't in use).
But it seems like this might be wide-spread enough to cause headaches
so we just will document and test for this behavior, clearly it hasn't
caused "issues" for anyone with the old behavior.
See: https://github.com/nodejs/node/pull/50800
Fix By: Brad House (@bradh352)
This PR implements a query cache at the lowest possible level, the actual dns request and response messages. Only successful and `NXDOMAIN` responses are cached. The lowest TTL in the response message determines the cache validity period for the response, and is capped at the configuration value for `qcache_max_ttl`. For `NXDOMAIN` responses, the SOA record is evaluated.
For a query to match the cache, the opcode, flags, and each question's class, type, and name are all evaluated. This is to prevent matching a cached entry for a subtly different query (such as if the RD flag is set on one request and not another).
For things like ares_getaddrinfo() or ares_search() that may spawn multiple queries, each individual message received is cached rather than the overarching response. This makes it possible for one query in the sequence to be purged from the cache while others still return cached results which means there is no chance of ever returning stale data.
We have had a lot of user requests to return TTLs on all the various parsers like `ares_parse_caa_reply()`, and likely this is because they want to implement caching mechanisms of their own, thus this PR should solve those issues as well.
Due to the internal data structures we have these days, this PR is less than 500 lines of new code.
Fixes#608
Fix By: Brad House (@bradh352)
The retry timeout values were using a fixed calculation which could cause multiple simultaneous queries to timeout and retry at the exact same time. If a DNS server is throttling requests, this could cause the issue to never self-resolve due to all requests recurring at the same instance again.
This PR also creates a maximum timeout option to make sure the random value selected does not exceed this value.
Fix By: Ignat (@Kontakter)
This PR implements ares_reinit() to safely reload a channel's configuration even if there are existing queries. This function can be called when system configuration is detected to be changed, however since c-ares isn't thread aware, care must be taken to ensure no other c-ares calls are in progress at the time this function is called. Also, this function may update the open file descriptor list so care must also be taken to wake any event loops and reprocess the list of file descriptors.
Fixes Bug #301
Fix By: Brad House (@bradh352)
adig previously performed manual parsing of the DNS records. Now it can focus strictly on formatting of output data for printing. It simply iterates across the parsed DNS packet and queries for the RRs, parameters for each RR, and the datatypes for each parameter. adig will now automatically pick up new RRs from the c-ares library due to the dynamic nature.
The adig format also now more closely resembles that of BIND's `dig` output.
A few more helpers needed to be added to the c-ares library that were missing. There ware a couple of minor bugs and enhancements also needed.
Example:
```
./adig -t ANY www.google.com
; <<>> c-ares DiG 1.21.0 <<>> www.google.com
;; Got answer:
;; ->>HEADER<<- opcode: QUERY, status: RCODE, id: 23913
;; flags: qr rd ra; QUERY: 1, ANSWER: 11, AUTHORITY: 0, ADDITIONAL: 1
;; OPT PSEUDOSECTION:
; EDNS: version: 0, flags: 0; udp: 512
;; QUESTION SECTION:
;www.google.com. IN ANY
;; ANSWER SECTION:
www.google.com. 162 IN A 142.251.107.99
www.google.com. 162 IN A 142.251.107.105
www.google.com. 162 IN A 142.251.107.103
www.google.com. 162 IN A 142.251.107.147
www.google.com. 162 IN A 142.251.107.104
www.google.com. 162 IN A 142.251.107.106
www.google.com. 162 IN AAAA 2607:f8b0:400c:c32::93
www.google.com. 162 IN AAAA 2607:f8b0:400c:c32::69
www.google.com. 162 IN AAAA 2607:f8b0:400c:c32::68
www.google.com. 162 IN AAAA 2607:f8b0:400c:c32::6a
www.google.com. 21462 IN HTTPS 1 . alpn="h2,h3"
;; MSG SIZE rcvd: 276
```
Fix By: Brad House (@bradh352)