On CI systems that can be overloaded, things like usleep() and select()
may not closely honor their timeouts, and can often be a multiple of the
requested timeout. Some tests, out of necessity need to rely on accurate
timing in order to test timeout conditions so this means test failures
when the skew is too large. Short of increasing timeouts to a point that
would make tests take an unreasonable amount of time, the alternative is
to make the OS honor the requested timeout more accurately.
On MacOS this means to set a realtime thread priority for the tests.
Other projects like libuv do this same thing. The code is taken from:
https://developer.apple.com/library/archive/technotes/tn2169/_index.html
Authored-By: Brad House (@bradh352)
GitHub actions supports running tests on various docker containers, move
Ubuntu 20.04 and Alpine tests to containers. Also move iOS testing to
GitHub actions since that runs on MacOS which is supported.
This should take additional load off of Cirrus-CI which consumes credits
like crazy. This leaves only FreeBSD and Linux ARM testing on Cirrus-CI.
Authored-By: Brad House (@bradh352)
Create a new data structure for a basic growable indexable array,
supporting the features you'd normally expect such as insert (at, last,
first), remove (at, last, first), and get (at, last, first). Internally
all data is stored in an appropriately sized array that can directly be
returned to the caller as a C array of the data type provided. The array
grows by powers of two, and has optimizations for head and tail
removals.
Array modifications can be risky (e.g. wrong reallocation sizes,
mis-sized memory moves, out-of-bounds access), so it makes sense to have
standardized code that is well tested.
The arrays used by the dns record parser / writer have been converted to
the new array data structure as have a few other instances.
Authored-By: Brad House (@bradh352)
TCP Fast Open (TFO) allows TCP connection establishment in 0-RTT when a
client and server have previously communicated. The SYN packet will also
contain the initial data packet from the client to the server. This
means there should be virtually no slowdown over UDP when both sides
support TCP FastOpen, which is unfortunately not always the case. For
instance, `1.1.1.1` appears to support TFO, however `8.8.8.8` does not.
This implementation supports Linux, Android, FreeBSD, MacOS, and iOS.
While Windows does have support for TCP FastOpen it does so via
completion APIs only, and that can't be used with polling APIs like used
by every other OS. We could implement it in the future if desired for
those using `ARES_OPT_EVENT_THREAD`, but it would probably require
adopting IOCP completely on Windows.
Sysctls are required to be set appropriately:
- Linux: `net.ipv4.tcp_fastopen`:
- `1` = client only (typically default)
- `2` = server only
- `3` = client and server
- MacOS: `net.inet.tcp.fastopen`
- `1` = client only
- `2` = server only
- `3` = client and server (typically default)
- FreeBSD: `net.inet.tcp.fastopen.server_enable` (boolean) and
`net.inet.tcp.fastopen.client_enable` (boolean)
This feature is always-on, when running on an OS with the capability
enabled. Though some middleboxes have impacted end-to-end TFO and caused
connectivity errors, all modern OSs perform automatic blackholing of IPs
that have issues with TFO. It is not expected this to cause any issues
in the modern day implementations.
This will also help with improving latency for future DoT and DoH
implementations.
Authored-By: Brad House (@bradh352)
Refactor some connection handling to reduce code duplication and to
unify the TCP and UDP codepaths a bit more. This will make some future
changes easier to make.
This also does some structure renaming to better conform with current
standards:
- `struct server_state` -> `ares_server_t`
- `struct server_connection` -> `ares_conn_t`
- `struct query` -> `ares_query_t`
Authored-by: Brad House (@bradh352)
During the implementation of server cookies, test cases were missing
to validate the server cookie in a prior reply was passed back, and
it turns out they were not.
This also adds tests for verification.
Fix By: Brad House (@bradh352)
DNS cookies are a simple form of learned mutual authentication supported
by most DNS server implementations these days and can help prevent DNS
Cache Poisoning attacks for clients and DNS amplification attacks for
servers.
Fixes#620
Fix By: Brad House (@bradh352)
c-ares is getting larger these days and we keep adding source files to
the same directory so it can be hard to differentiate core c-ares
implementation from library/utility functions. Lets make some
subdirectories to help with that and shuffle files around.
Fix By: Brad House (@bradh352)
UDP is connectionless, but systems use ICMP unreachable messages to
indicate there is no ability to reach the host or port, which can result
in a `send()` returning an error like `ECONNREFUSED`. We need to handle
non-retryable codes like that to treat it as a connection failure so we
requeue any queries on that connection to another connection/server
immediately. Otherwise what happens is we just wait on the timeout to
expire which can greatly increase the time required to get a definitive
message.
This also adds a test case to verify the behavior.
Fixes#819
Fix By: Brad Houes (@bradh352)
c-ares utilizes recursion for some operations, and some of these
processes can have unintended side effects, such as if a callback
is called that then recurses into the same function. This can cause
strange cleanup conditions that lead to crashes.
Try to disassociate queries with connections as early as possible and
move cleaning up unneeded connections to its own scan rather than
trying to detect each time a query is disassociated from a connection.
Fix By: Brad House (@bradh352)
In c-ares 1.30.0 we started validating strings parsed are printable.
This caused a regression in a pycares test case due to a wrong response
code being returned as the error was being propagated from a different
section of code that was assuming the only possible failure condition
was out-of-memory.
This PR adds a fix for this and also a test case to validate it.
Ref: https://github.com/saghul/pycares/issues/200
Fix By: Brad House (@bradh352)
We've had reports of user-after-free type crashes in Windows cleanup
code for the Event Thread. In evaluating the code, it appeared there
were some memory leaks on per-connection handles that may have remained
open during shutdown, while trying to resolve that it became apparent
the methodology chosen may not have been the right one for interfacing
with the Windows AFD system as stability issues were seen during this
debugging process.
Since this system is completely undocumented, there was no clear
resolution path other than to switch to the *other* methodology which
involves directly opening `\Device\Afd`, rather than spawning a "peer
socket" to use to queue AFD operations.
The original methodology chosen more closely resembled what is employed
by [libuv](https://github.com/libuv/libuv) and given its widespread use
was the reason it was used. The new methodology more closely resembles
[wepoll](https://github.com/piscisaureus/wepoll).
Its not clear if there are any scalability or performance advantages or
disadvantages for either method. They both seem like different ways to
do the same thing, but this current way does seem more stable.
Fixes#798
Fix By: Brad House (@bradh352)
There was a thread passed data for processing that was cleaned up before
thread exit, and it could cause a use-after-free in the test suite. This
doesn't affect c-ares. This was found during trying to reproduce #798,
but appears unrelated, don't use a helper thread as it isn't necessary.
Fix By: Brad House (@bradh352)
As per #738, there are usecases where the DNS TXT record strings should
not be concatenated like RFC 7208 indicates. We cannot break ABI with
those using the new API, so we need to support retrieving the
concatenated version as well as a new API to retrieve the individual
strings which will be used by `ares_parse_text_reply_ext()` to restore
the old behavior prior to c-ares 1.20.
Fixes Issue: #738
Fix By: Brad House (@bradh352)
This PR enables DNS 0x20 as per
https://datatracker.ietf.org/doc/html/draft-vixie-dnsext-dns0x20-00 .
DNS 0x20 adds additional entropy to the request by randomly altering the
case of the DNS question to help prevent cache poisoning attacks.
Google DNS has implemented this support as of 2023, even though this is
a proposed and expired standard from 2008:
https://groups.google.com/g/public-dns-discuss/c/KxIDPOydA5M
There have been documented cases of name server and caching server
non-conformance, though it is expected to become more rare, especially
since Google has started using this.
This can be enabled via the `ARES_FLAG_DNS0x20` flag, which is currently
disabled by default. The test cases do however enable this flag to
validate this feature.
Implementors using this flag will notice that responses will retain the
mixed case, but since DNS names are case-insensitive, any proper
implementation should not be impacted.
There is currently no fallback mechanism implemented as it isn't
immediately clear how this may affect a stub resolver like c-ares where
we aren't querying the authoritative name server, but instead an
intermediate recursive resolver where some domains may return invalid
results while others return valid results, all while querying the same
nameserver. Likely using DNS cookies as suggested by #620 is a better
mechanism to fight cache poisoning attacks for stub resolvers.
TCP queries do not use this feature even if the `ARES_FLAG_DNS0x20` flag
is specified since they are not subject to cache poisoning attacks.
Fixes Issue: #795
Fix By: Brad House (@bradh352)
The header inclusion logic in c-ares is hard to follow. Lets try to
simplify the way it works to make it easier to understand and less
likely to break on new code changes. There's still more work to be done,
but this is a good start at simplifying things.
Fix By: Brad House (@bradh352)
MSVC has been building with /W3 which isn't considered a safe level for
modern code. /W4 is recommended, but it too is lacking some recommended
options, so we enable /W4 and also the recommended options. We do,
however, have to disable a couple of options due to Windows headers not
being fully compliant sometimes as well as some things we do in c-ares
that it doesn't like, but aren't actually bad.
Fix By: Brad House (@bradh352)
The query cache should be enabled by default. This will help with
determining proper timeouts for #736. It can still be disabled by
setting the ttl to 0. There should be no negative consequences of this
in real-world scenarios since DNS is based on the TTL concept and
upstream servers will cache results and not recurse based on this
information anyhow. DNS queries and responses are very small, this
should have negligible impact on memory consumption.
Fix By: Brad House (@bradh352)
Add code annotations for ignoring specific code paths for coverage
calculations. The primary purpose of this is to make it easy to see the
code paths that we could (and probably should) write test cases for, as
these would have the most impact on delivery of a stable product.
The annotations used are:
`LCOV_EXCL_LINE: <designation>`, `LCOV_EXCL_START: <designation>`,
`LCOV_EXCL_STOP`
Unfortunately `LCOV_EXCL_BR_LINE` does not appear to be supported by
coveralls as it would have been a more elegant solution over START/STOP.
We specifically include the `<designation>` not just for future
reference but because it makes it easy to identify in case we want to
address these conditions in a different way in the future.
The main areas designated for exclusion are:
1. `OutOfMemory` - these are hard to test cases, and on modern systems,
are likely to never occur due to optimistic memory allocations, which
can then later cause the kernel to terminate your application due to
memory not actually being available. c-ares does have *some* testing
framework for this, if we wish to expand in the future, we can easily
use sed to get rid of of these annotations.
2. `DefensiveCoding` - these are impossible to reach paths at the point
in time the code was written. They are there for defensive coding in
case code is refactored in the future to prevent unexpected behavior.
3. `UntestablePath` - these are code paths that aren't possible to test,
such as failure of a system call.
4. `FallbackCode` - This is an entire set of code that is untestable
because its not able to simulate a failure of the primary path.
This PR also does add some actual coverage in the test cases where it is
easy to do.
Fix By: Brad House (@bradh352)
With the current c-ares parser, as per PR #765 parsing was broken due to
validation that didn't understand the `SIG` record class. This PR adds
basic, non validating, and incomplete support for the `SIG` record type.
The additional `KEY` and `NXT` which would be required for additional
verification of the records is not implemented. It also does not store
the raw unprocessed RR data that would be required for the validation.
The primary purpose of this PR is to be able to recognize the record and
handle some periphery aspects such as validation of the class associated
with the RR and to not honor the TTL in the RR in the c-ares query cache
since it will always be 0.
Fixes#765
Fix By: Brad House (@bradh352)
As per Issue #760, the use of `struct timeval` is meant for only time
differentials, however it could be used to denote an exact timeout. This
could lead to y2k38 issues on some platforms.
Fixes Issue #760
Fix By: Brad House (@bradh352)