If a blank DNS name is used, the DNS query cache would fail due to an
invalid sanity check. This can be legitimate such as:
adig -t SOA .
This fixes that situation as well as a few other spots that were
uncovered and adds a test case to validate the behavior to ensure
it won't regress in the future.
Fixes#858
Reported-By: Nodar Chkuaselidze (@nodech)
Authored-By: Brad House (@bradh352)
When using EventThreads, the config change cleanup code might manipulate
the event update list if it uses file descriptors (such as on Linux).
This was being done without a lock. Rework the event enqueuing to handle
locking internally to prevent this and to simplify where it is used.
This was found by chance during an ASAN CI run.
Fix By: Brad House (@bradh352)
We've been using a lot of time on Cirrus-CI and our credits run out
quickly. MacOS costs 15 compute credits vs 3 compute
credits for Linux. Move MacOS testing to GitHub Actions.
Fix By: Brad House (@bradh352)
Make ahost a dependency of adig to prevent issues with them both
referencing ares_strcasecmp.c and ares_getopt.c. This appears to be
a bug in the Cmake generator for MSVC project files.
Fixes#796
Fix By: Brad House (@bradh352)
`ares__hosts_entry_to_hostent()` would allocate a separate buffer for
each address, but `ares_free_hostent()` expects a single allocation to
hold all addresses.
This PR fixes this issue and simplifies the logic by using the
already-existing `ares__addrinfo2hostent()` to write the hostent instead
of coming up with yet another way to write the structure.
Fixes#823
Fix By: Brad House (@bradh352)
UDP is connectionless, but systems use ICMP unreachable messages to
indicate there is no ability to reach the host or port, which can result
in a `send()` returning an error like `ECONNREFUSED`. We need to handle
non-retryable codes like that to treat it as a connection failure so we
requeue any queries on that connection to another connection/server
immediately. Otherwise what happens is we just wait on the timeout to
expire which can greatly increase the time required to get a definitive
message.
This also adds a test case to verify the behavior.
Fixes#819
Fix By: Brad Houes (@bradh352)
c-ares utilizes recursion for some operations, and some of these
processes can have unintended side effects, such as if a callback
is called that then recurses into the same function. This can cause
strange cleanup conditions that lead to crashes.
Try to disassociate queries with connections as early as possible and
move cleaning up unneeded connections to its own scan rather than
trying to detect each time a query is disassociated from a connection.
Fix By: Brad House (@bradh352)
In c-ares 1.30.0 we started validating strings parsed are printable.
This caused a regression in a pycares test case due to a wrong response
code being returned as the error was being propagated from a different
section of code that was assuming the only possible failure condition
was out-of-memory.
This PR adds a fix for this and also a test case to validate it.
Ref: https://github.com/saghul/pycares/issues/200
Fix By: Brad House (@bradh352)
We've had reports of user-after-free type crashes in Windows cleanup
code for the Event Thread. In evaluating the code, it appeared there
were some memory leaks on per-connection handles that may have remained
open during shutdown, while trying to resolve that it became apparent
the methodology chosen may not have been the right one for interfacing
with the Windows AFD system as stability issues were seen during this
debugging process.
Since this system is completely undocumented, there was no clear
resolution path other than to switch to the *other* methodology which
involves directly opening `\Device\Afd`, rather than spawning a "peer
socket" to use to queue AFD operations.
The original methodology chosen more closely resembled what is employed
by [libuv](https://github.com/libuv/libuv) and given its widespread use
was the reason it was used. The new methodology more closely resembles
[wepoll](https://github.com/piscisaureus/wepoll).
Its not clear if there are any scalability or performance advantages or
disadvantages for either method. They both seem like different ways to
do the same thing, but this current way does seem more stable.
Fixes#798
Fix By: Brad House (@bradh352)
AppVeyor has gotten progressively slower over the last year or so. Move
some Windows builds to GitHub actions which is considerably faster.
Fix By: Brad House (@bradh352)
The query cache should be enabled by default. This will help with
determining proper timeouts for #736. It can still be disabled by
setting the ttl to 0. There should be no negative consequences of this
in real-world scenarios since DNS is based on the TTL concept and
upstream servers will cache results and not recurse based on this
information anyhow. DNS queries and responses are very small, this
should have negligible impact on memory consumption.
Fix By: Brad House (@bradh352)
On some broken MacOS SDK versions, if HAVE_UNISTD_H is defined, it
must be defined to 1, otherwise it will cause build errors. Autotools
always defines to 1, but CMake doesn't.
Fixes Issue: #787
Fix By: Brad House (@bradh352)
The `NotifyIpInterfaceChange()` method only notifies of automatic
changes, such as via DHCP or interfaces going up or down. However,
someone editing the DNS settings manually would not trigger a
notification.
Use `RegNotifyChangeKeyValue()` to monitor
`HKLM\SYSTEM\CurrentControlSet\Services\Tcpip6\Parameters\Interfaces`
and `HKLM\SYSTEM\CurrentControlSet\Services\Tcpip\Parameters\Interfaces`
to pick up these changes.
This new check also works when using the OpenWatcom compiler as the
symbols are available, so they have been enabled (though
`NotifyIpInterfaceChange()` method is disabled for OpenWatcom still due
to the linker libraries missing this symbol).
NOTE: it does appear that about 5 events get triggered for some reason,
it doesn't seem to hurt anything, but it will actually cause it to
re-read the configuration multiple times.
Fixes Issue: #761
Fix By: Brad House (@bradh352)
Add code annotations for ignoring specific code paths for coverage
calculations. The primary purpose of this is to make it easy to see the
code paths that we could (and probably should) write test cases for, as
these would have the most impact on delivery of a stable product.
The annotations used are:
`LCOV_EXCL_LINE: <designation>`, `LCOV_EXCL_START: <designation>`,
`LCOV_EXCL_STOP`
Unfortunately `LCOV_EXCL_BR_LINE` does not appear to be supported by
coveralls as it would have been a more elegant solution over START/STOP.
We specifically include the `<designation>` not just for future
reference but because it makes it easy to identify in case we want to
address these conditions in a different way in the future.
The main areas designated for exclusion are:
1. `OutOfMemory` - these are hard to test cases, and on modern systems,
are likely to never occur due to optimistic memory allocations, which
can then later cause the kernel to terminate your application due to
memory not actually being available. c-ares does have *some* testing
framework for this, if we wish to expand in the future, we can easily
use sed to get rid of of these annotations.
2. `DefensiveCoding` - these are impossible to reach paths at the point
in time the code was written. They are there for defensive coding in
case code is refactored in the future to prevent unexpected behavior.
3. `UntestablePath` - these are code paths that aren't possible to test,
such as failure of a system call.
4. `FallbackCode` - This is an entire set of code that is untestable
because its not able to simulate a failure of the primary path.
This PR also does add some actual coverage in the test cases where it is
easy to do.
Fix By: Brad House (@bradh352)