As per #852 searching is failing, partially it is due to the ndots value
not defaulting to a proper value on linux, and partially due to
systemd-resolved returning the wrong error codes.
This PR fixes the first issue and adds containerized test cases to
validate the behavior and prevent issues in the future.
Reported-By: Hans-Christian Egtvedt (@egtvedt) and Mikael Lindemann(@mikaellindemann)
Authored-By: Brad House (@bradh352)
TCP Fast Open (TFO) allows TCP connection establishment in 0-RTT when a
client and server have previously communicated. The SYN packet will also
contain the initial data packet from the client to the server. This
means there should be virtually no slowdown over UDP when both sides
support TCP FastOpen, which is unfortunately not always the case. For
instance, `1.1.1.1` appears to support TFO, however `8.8.8.8` does not.
This implementation supports Linux, Android, FreeBSD, MacOS, and iOS.
While Windows does have support for TCP FastOpen it does so via
completion APIs only, and that can't be used with polling APIs like used
by every other OS. We could implement it in the future if desired for
those using `ARES_OPT_EVENT_THREAD`, but it would probably require
adopting IOCP completely on Windows.
Sysctls are required to be set appropriately:
- Linux: `net.ipv4.tcp_fastopen`:
- `1` = client only (typically default)
- `2` = server only
- `3` = client and server
- MacOS: `net.inet.tcp.fastopen`
- `1` = client only
- `2` = server only
- `3` = client and server (typically default)
- FreeBSD: `net.inet.tcp.fastopen.server_enable` (boolean) and
`net.inet.tcp.fastopen.client_enable` (boolean)
This feature is always-on, when running on an OS with the capability
enabled. Though some middleboxes have impacted end-to-end TFO and caused
connectivity errors, all modern OSs perform automatic blackholing of IPs
that have issues with TFO. It is not expected this to cause any issues
in the modern day implementations.
This will also help with improving latency for future DoT and DoH
implementations.
Authored-By: Brad House (@bradh352)
DNS cookies are a simple form of learned mutual authentication supported
by most DNS server implementations these days and can help prevent DNS
Cache Poisoning attacks for clients and DNS amplification attacks for
servers.
Fixes#620
Fix By: Brad House (@bradh352)
c-ares is getting larger these days and we keep adding source files to
the same directory so it can be hard to differentiate core c-ares
implementation from library/utility functions. Lets make some
subdirectories to help with that and shuffle files around.
Fix By: Brad House (@bradh352)
We've had reports of user-after-free type crashes in Windows cleanup
code for the Event Thread. In evaluating the code, it appeared there
were some memory leaks on per-connection handles that may have remained
open during shutdown, while trying to resolve that it became apparent
the methodology chosen may not have been the right one for interfacing
with the Windows AFD system as stability issues were seen during this
debugging process.
Since this system is completely undocumented, there was no clear
resolution path other than to switch to the *other* methodology which
involves directly opening `\Device\Afd`, rather than spawning a "peer
socket" to use to queue AFD operations.
The original methodology chosen more closely resembled what is employed
by [libuv](https://github.com/libuv/libuv) and given its widespread use
was the reason it was used. The new methodology more closely resembles
[wepoll](https://github.com/piscisaureus/wepoll).
Its not clear if there are any scalability or performance advantages or
disadvantages for either method. They both seem like different ways to
do the same thing, but this current way does seem more stable.
Fixes#798
Fix By: Brad House (@bradh352)
There was a thread passed data for processing that was cleaned up before
thread exit, and it could cause a use-after-free in the test suite. This
doesn't affect c-ares. This was found during trying to reproduce #798,
but appears unrelated, don't use a helper thread as it isn't necessary.
Fix By: Brad House (@bradh352)
This PR enables DNS 0x20 as per
https://datatracker.ietf.org/doc/html/draft-vixie-dnsext-dns0x20-00 .
DNS 0x20 adds additional entropy to the request by randomly altering the
case of the DNS question to help prevent cache poisoning attacks.
Google DNS has implemented this support as of 2023, even though this is
a proposed and expired standard from 2008:
https://groups.google.com/g/public-dns-discuss/c/KxIDPOydA5M
There have been documented cases of name server and caching server
non-conformance, though it is expected to become more rare, especially
since Google has started using this.
This can be enabled via the `ARES_FLAG_DNS0x20` flag, which is currently
disabled by default. The test cases do however enable this flag to
validate this feature.
Implementors using this flag will notice that responses will retain the
mixed case, but since DNS names are case-insensitive, any proper
implementation should not be impacted.
There is currently no fallback mechanism implemented as it isn't
immediately clear how this may affect a stub resolver like c-ares where
we aren't querying the authoritative name server, but instead an
intermediate recursive resolver where some domains may return invalid
results while others return valid results, all while querying the same
nameserver. Likely using DNS cookies as suggested by #620 is a better
mechanism to fight cache poisoning attacks for stub resolvers.
TCP queries do not use this feature even if the `ARES_FLAG_DNS0x20` flag
is specified since they are not subject to cache poisoning attacks.
Fixes Issue: #795
Fix By: Brad House (@bradh352)
The query cache should be enabled by default. This will help with
determining proper timeouts for #736. It can still be disabled by
setting the ttl to 0. There should be no negative consequences of this
in real-world scenarios since DNS is based on the TTL concept and
upstream servers will cache results and not recurse based on this
information anyhow. DNS queries and responses are very small, this
should have negligible impact on memory consumption.
Fix By: Brad House (@bradh352)
As per Issue #760, the use of `struct timeval` is meant for only time
differentials, however it could be used to denote an exact timeout. This
could lead to y2k38 issues on some platforms.
Fixes Issue #760
Fix By: Brad House (@bradh352)
This PR adds a new function `ares_search_dnsrec()` to search for records
using the new DNS record parser.
The function takes an arbitrary DNS record object to search (that must
represent a query for a single name). The function takes a new callback
type, `ares_callback_dnsrec`, that is invoked with a parsed DNS record
object rather than the raw buffer(+length).
The original motivation for this change is to provide support for
[draft-kaplan-enum-sip-routing-04](https://datatracker.ietf.org/doc/html/draft-kaplan-enum-sip-routing-04);
when routing phone calls using an ENUM server, it can be useful to
include identifying source information in an OPT RR options value, to
help select the appropriate route for the call. The new function allows
for more customisable searches like this.
**Summary of code changes**
A new function `ares_search_dnsrec()` has been added and exposed.
Moreover, the entire `ares_search_int()` internal code flow has been
refactored to use parsed DNS record objects and the new DNS record
parser. The DNS record object is passed through the `search_query`
structure by encoding/decoding to/from a buffer (if multiple search
domains are used). A helper function `ares_dns_write_query_altname()` is
used to re-write the DNS record object with a new query name (used to
append search domains).
`ares_search()` is now a wrapper around the new internal code, where the
DNS record object is created based on the name, class and type
parameters.
The new function uses a new callback type, `ares_callback_dnsrec`. This
is invoked with a parsed DNS record object. For now, we convert from
`ares_callback` to this new type using `ares__dnsrec_convert_cb()`.
Some functions that are common to both `ares_query()` and
`ares_search()` have been refactored using the new DNS record parser.
See `ares_dns_record_create_query()` and
`ares_dns_query_reply_tostatus()`.
**Testing**
A new FV has been added to test the new function, which searches for a
DNS record containing an OPT RR with custom options value.
As part of this, I needed to enhance the mock DNS server to expect
request text (and assert that it matches actual request text). This is
because the FV needs to check that the request contains the correct OPT
RR.
**Documentation**
The man page docs have been updated to describe the new feature.
**Futures**
In the future, a new variant of `ares_send()` could be introduced in the
same vein (`ares_send_dnsrec()`). This could be used by
`ares_search_dnsrec()`. Moreover, we could migrate internal code to use
`ares_callback_dnsrec` as the default callback.
This will help to make the new DNS record parser the norm in C-Ares.
---------
Co-authored-by: Oliver Welsh (@oliverwelsh)
It may be useful to wait for the queue to be empty under certain conditions (mainly test cases), expose a function to efficiently do this and rework test cases to use it.
Fix By: Brad House (@bradh352)
This PR implements an event thread to process all events on file descriptors registered by c-ares. Prior to this feature, integrators were required to understand the internals of c-ares and how to monitor file descriptors and timeouts and process events.
Implements OS-specific efficient polling such as epoll(), kqueue(), or IOCP, and falls back to poll() or select() if otherwise unsupported. At this point, it depends on basic threading primitives such as pthreads or windows threads.
If enabled via the ARES_OPT_EVENT_THREAD option passed to ares_init_options(), then socket callbacks cannot be used.
Fixes Bug: #611
Fix By: Brad House (@bradh352)
New test cases depend on internal symbols for calculating timeouts.
Disable those test features if symbol hiding is enabled.
Fixes Bug: #664
Fix By: Brad House (@bradh352)
When doing ares_gethostbyname() or ares_getaddrinfo() with AF_UNSPEC, if ares_cancel() was called after one address class was returned but before the other address class, it would return ARES_SUCCESS rather than ARES_ECANCELLED.
Test case has been added for this specific condition.
Fixes Bug: #662
Fix By: Brad House (@bradh352)
Some environments may send router advertisements on a link setting their link-local (fe80::/10) address as a valid DNS server to the remote system. This will cause a DNS entry to be created like `fe80::1%iface`, since all link-local network interfaces are technically part of the same /10 subnet, it must be told what interface to send packets through explicitly if there are multiple physical interfaces.
This PR adds support for the %iface modifier when setting DNS servers via `/etc/resolv.conf` as well as via `ares_set_servers_csv()`.
For MacOS and iOS it is assumed that libresolve will set the `sin6_scope_id` and should be supported, but my test systems don't seem to read the Router Advertisement for RDNSS link-local. Specifying the link-local dns server on MacOS via adig has been tested and confirmed working.
For Windows, this is similar to MacOS in that the system doesn't seem to honor the RDNSS RA, but specifying manually has been tested to work.
At this point, Android support does not exist.
Fixes Bug #462
Supersedes PR #463
Fix By: Brad House (@bradh352) and Serhii Purik (@sergvpurik)
Regression from c-ares 1.19.1, ARES_OPT_UDP_PORT and ARES_OPT_TCP_PORT are
specified from the user in host-byte order, but there was a regression that
caused it to be read as if it was network byte order.
Fixes Bug: #640
Reported By: @Flow86
Fix By: Brad House (@bradh352)
`ares_channel` is defined as `typedef struct ares_channeldata *ares_channel;`. The problem with this, is it embeds the pointer into the typedef, which means an `ares_channel` can never be declared as `const` as if you write `const ares_channel channel`, that expands to `struct ares_channeldata * const ares_channel` and not `const struct ares_channeldata *channel`.
We will now typedef `ares_channel_t` as `typedef struct ares_channeldata ares_channel_t;`, so if you write `const ares_channel_t *channel`, it properly expands to `const struct ares_channeldata *channel`.
We are maintaining the old typedef for API compatibility with existing integrations, and due to typedef expansion this should not even cause any compiler warnings for existing code. There are no ABI implications with this change. I could be convinced to keep existing public functions as `ares_channel` if a sufficient argument exists, but internally we really need make this change for modern best practices.
This change will allow us to internally use `const ares_channel_t *` where appropriate. Whether or not we decide to change any public interfaces to use `const` may require further discussion on if there might be ABI implications (I don't think so, but I'm also not 100% sure what a compiler internally does with `const` when emitting machine code ... I think more likely ABI implications would occur going the opposite direction).
FYI, This PR was done via a combination of sed and clang-format, the only manual code change was the addition of the new typedef, and a couple doc fixes :)
Fix By: Brad House (@bradh352)
HOSTS FILE PROCESSING OVERVIEW
==============================
The hosts file on the system contains static entries to be processed locally
rather than querying the nameserver. Each row is an IP address followed by
a list of space delimited hostnames that match the ip address. This is used
for both forward and reverse lookups.
We are caching the entire parsed hosts file for performance reasons. Some
files may be quite sizable and as per Issue #458 can approach 1/2MB in size,
and the parse overhead on a rapid succession of queries can be quite large.
The entries are stored in forwards and backwards hashtables so we can get
O(1) performance on lookup. The file is cached until the file modification
timestamp changes (or 60s if there is no implemented stat() capability).
The hosts file processing is quite unique. It has to merge all related hosts
and ips into a single entry due to file formatting requirements. For
instance take the below:
```
127.0.0.1 localhost.localdomain localhost
::1 localhost.localdomain localhost
192.168.1.1 host.example.com host
192.168.1.5 host.example.com host
2620🔢:1 host.example.com host6.example.com host6 host
```
This will yield 2 entries.
1) ips: `127.0.0.1,::1`
hosts: `localhost.localdomain,localhost`
2) ips: `192.168.1.1,192.168.1.5,2620🔢:1`
hosts: `host.example.com,host,host6.example.com,host6`
It could be argued that if searching for `192.168.1.1` that the `host6`
hostnames should not be returned, but this implementation will return them
since they are related (both ips have the fqdn of host.example.com). It is
unlikely this will matter in the real world.
Fix By: Brad House (@bradh352)
The test framework was using 100ms timeout passed to select(), and not using ares_timeout() to calculate the actual recommended value based on the queries in queue. Using ares_timeout() tests the functionality of ares_timeout() itself and will provide more responsive results.
Fix By: Brad House (@bradh352)
As per #266, TCP queries are basically broken. If we get a partial reply, things just don't work, but unlike UDP, TCP may get fragmented and we need to properly handle that.
I've started creating a basic parser/buffer framework for c-ares for memory safety reasons, but it also helps for things like this where we shouldn't be manually tracking positions and fetching only a couple of bytes at a time from a socket. This parser/buffer will be expanded and used more in the future.
This also resolves#206 by allowing NULL to be specified for some socket callbacks so they will auto-route to the built-in c-ares functions.
Fixes: #206, #266
Fix By: Brad House (@bradh352)
All files have their licence and copyright information clearly
identifiable. If not in the file header, they are set separately in
.reuse/dep5.
All used license texts are provided in LICENSES/
ares_gethostbyname() and ares_getaddrinfo() do a lot of similar things, however ares_getaddrinfo() has some desirable behaviors that should be imported into ares_gethostbyname(). For one, it sorts the address lists for the most likely to succeed based on the current system routes. Next, when AF_UNSPEC is specified, it properly handles search lists instead of first searching all of AF_INET6 then AF_INET, since ares_gethostbyname() searches in parallel. Therefore, this PR should also resolve the issues attempted in #94.
A few things this PR does:
1. ares_parse_a_reply() and ares_parse_aaaa_reply() had very similar code to translate struct ares_addrinfo into a struct hostent as well as into struct ares_addrttl/ares_addr6ttl this has been split out into helper functions of ares__addrinfo2hostent() and ares__addrinfo2addrttl() to prevent this duplicative code.
2. ares_getaddrinfo() was apparently never honoring HOSTALIASES, and this was discovered once ares_gethostbyname() was turned into a wrapper, the affected test cases started failing.
3. A slight API modification to save the query hostname into struct ares_addrinfo as the last element of name. Since this is the last element, and all user-level instances of struct ares_addrinfo are allocated internally by c-ares, this is not an ABI-breaking change nor would it impact any API compatibility. This was needed since struct hostent has an h_name element.
4. Test Framework: MockServer tests via TCP would fail if more than 1 request was received at a time which is common when ares_getaddrinfo() queries for both A and AAAA records simultaneously. Infact, this was a long standing issue in which the ares_getaddrinfo() test were bypassing TCP alltogether. This has been corrected, the message is now processed in a loop.
5. Some tests had to be updated for overall correctness as they were invalid but somehow passing prior to this change.
Change By: Brad House (@bradh352)
Some systems may return either NULL or a valid pointer on malloc(0). c-ares should never call malloc(0) so lets return NULL so we're more likely to find an issue if it were to occur.
The c-ares test suite was hardcoded to use port 5300 (and possibly 5301, 5302) for the test suite. Especially in containers, there may be no guarantee these ports are available and cause tests to fail when they could otherwise succeed. Instead, request the system to assign a port to use dynamically. This is now the default. To override, the test suite still takes the "-p <port>" option as it always has and will honor that.
Fix By: Anthony Penniston (@apenn-msft)
The rc4 function iterates over a buffer of size buffer_len who's maximum
value is INT_MAX with a counter of type short that is not guaranteed to
have maximum size INT_MAX.
In circumstances where short is narrower than int and where buffer_len
is larger than the maximum value of a short, it may be possible to loop
infinitely as counter will overflow and never be greater than or equal
to buffer_len.
The solution is to make the comparison be between types of equal width.
This commit defines counter as an int.
Fix By: Fionn Fitzmaurice (@fionn)