This PR implements an event thread to process all events on file descriptors registered by c-ares. Prior to this feature, integrators were required to understand the internals of c-ares and how to monitor file descriptors and timeouts and process events.
Implements OS-specific efficient polling such as epoll(), kqueue(), or IOCP, and falls back to poll() or select() if otherwise unsupported. At this point, it depends on basic threading primitives such as pthreads or windows threads.
If enabled via the ARES_OPT_EVENT_THREAD option passed to ares_init_options(), then socket callbacks cannot be used.
Fixes Bug: #611
Fix By: Brad House (@bradh352)
c-ares does not have any concept of thread-safety. It has always been 100% up to the implementor to ensure they never call c-ares from more than one thread at a time. This patch adds basic thread-safety support, which can be disabled at compile time if not desired. It uses a single recursive mutex per channel, which should be extremely quick when uncontested so overhead should be minimal.
Fixes Bug: #610
Also sets the stage to implement #611
Fix By: Brad House (@bradh352)
This PR implements a query cache at the lowest possible level, the actual dns request and response messages. Only successful and `NXDOMAIN` responses are cached. The lowest TTL in the response message determines the cache validity period for the response, and is capped at the configuration value for `qcache_max_ttl`. For `NXDOMAIN` responses, the SOA record is evaluated.
For a query to match the cache, the opcode, flags, and each question's class, type, and name are all evaluated. This is to prevent matching a cached entry for a subtly different query (such as if the RD flag is set on one request and not another).
For things like ares_getaddrinfo() or ares_search() that may spawn multiple queries, each individual message received is cached rather than the overarching response. This makes it possible for one query in the sequence to be purged from the cache while others still return cached results which means there is no chance of ever returning stale data.
We have had a lot of user requests to return TTLs on all the various parsers like `ares_parse_caa_reply()`, and likely this is because they want to implement caching mechanisms of their own, thus this PR should solve those issues as well.
Due to the internal data structures we have these days, this PR is less than 500 lines of new code.
Fixes#608
Fix By: Brad House (@bradh352)
The retry timeout values were using a fixed calculation which could cause multiple simultaneous queries to timeout and retry at the exact same time. If a DNS server is throttling requests, this could cause the issue to never self-resolve due to all requests recurring at the same instance again.
This PR also creates a maximum timeout option to make sure the random value selected does not exceed this value.
Fix By: Ignat (@Kontakter)
All DNS servers support EDNS, by using this by default, it will allow larger responses without the need to switch to TCP. If by chance a DNS server is hit that doesn't support EDNS, this is detected due to the lack of the OPT RR in the response and will be automatically retried without EDNS.
Fix By: Brad House (@bradh352)
`ares_channel` is defined as `typedef struct ares_channeldata *ares_channel;`. The problem with this, is it embeds the pointer into the typedef, which means an `ares_channel` can never be declared as `const` as if you write `const ares_channel channel`, that expands to `struct ares_channeldata * const ares_channel` and not `const struct ares_channeldata *channel`.
We will now typedef `ares_channel_t` as `typedef struct ares_channeldata ares_channel_t;`, so if you write `const ares_channel_t *channel`, it properly expands to `const struct ares_channeldata *channel`.
We are maintaining the old typedef for API compatibility with existing integrations, and due to typedef expansion this should not even cause any compiler warnings for existing code. There are no ABI implications with this change. I could be convinced to keep existing public functions as `ares_channel` if a sufficient argument exists, but internally we really need make this change for modern best practices.
This change will allow us to internally use `const ares_channel_t *` where appropriate. Whether or not we decide to change any public interfaces to use `const` may require further discussion on if there might be ABI implications (I don't think so, but I'm also not 100% sure what a compiler internally does with `const` when emitting machine code ... I think more likely ABI implications would occur going the opposite direction).
FYI, This PR was done via a combination of sed and clang-format, the only manual code change was the addition of the new typedef, and a couple doc fixes :)
Fix By: Brad House (@bradh352)
When referring to another c-ares function use \fI function(3) \fP to let
the webpage rendering find and cross-link them appropriately.
SEE ALSO references should be ".BR name (3),", with a space before the
open parenthesis. This helps the manpage to HTML renderer.
Closes#565
Add a new ARES_OPT_UDP_MAX_QUERIES option with udp_max_queries parameter that can be passed to ares_init_options(). This value defaults to 0 (unlimited) to maintain existing compatibility, any positive number will cause new UDP ephemeral ports to be created once the threshold is reached, we'll call these 'connections' even though its technically wrong for UDP.
Implementation Details:
* Each server entry in a channel now has a linked-list of connections/ports for udp and tcp. The first connection in the list is the one most likely to be eligible to accept new queries.
* Queries are now tracked by connection rather than by server.
* Every time a query is detached from a connection, the connection that it was attached to will be checked to see if it needs to be cleaned up.
* Insertion, lookup, and searching for connections has been implemented as O(1) complexity so the number of connections will not impact performance.
* Remove is_broken from the server, it appears it would be set and immediately unset, so must have been invalidated via a prior patch. A future patch should probably track consecutive server errors and de-prioritize such servers. The code right now will always try servers in the order of configuration, so a bad server in the list will always be tried and may rely on timeout logic to try the next.
* Various other cleanups to remove code duplication and for clarification.
Fixes Bug: #444
Fix By: Brad House (@bradh352)
A lot of time has passed since the original timeouts and retry counts were chosen. We have on and off issues reported due to this. Even on geostationary satellite links, latency is worst case around 1.5s. This PR changes the per-server timeout to 2s and the retry count lowered from 4 to 3.
Fix By: Brad House (@bradh352)
All files have their licence and copyright information clearly
identifiable. If not in the file header, they are set separately in
.reuse/dep5.
All used license texts are provided in LICENSES/
As per #487, documentation states the port should be in network byte
order, but we can see from the test cases using MockServers on
different ports that this is not the case, it is definitely in host
byte order.
Fix By: Brad House (@bradh352)
Originally started by Daniel Stenberg (@bagder) with #123, this patch reorganizes the c-ares source tree to have a more modern layout. It also fixes out of tree builds for autotools, and automatically builds the tests if tests are enabled. All tests are passing which tests each of the supported build systems (autotools, cmake, nmake, mingw gmake). There may be some edge cases that will have to be caught later on for things I'm not aware of.
Fix By: Brad House (@bradh352)
The rc4 function iterates over a buffer of size buffer_len who's maximum
value is INT_MAX with a counter of type short that is not guaranteed to
have maximum size INT_MAX.
In circumstances where short is narrower than int and where buffer_len
is larger than the maximum value of a short, it may be possible to loop
infinitely as counter will overflow and never be greater than or equal
to buffer_len.
The solution is to make the comparison be between types of equal width.
This commit defines counter as an int.
Fix By: Fionn Fitzmaurice (@fionn)