c-ares

Commit Graph

Author	SHA1	Message	Date
Brad House	70f10a85f3	DNS 0x20 implementation (#800 ) This PR enables DNS 0x20 as per https://datatracker.ietf.org/doc/html/draft-vixie-dnsext-dns0x20-00 . DNS 0x20 adds additional entropy to the request by randomly altering the case of the DNS question to help prevent cache poisoning attacks. Google DNS has implemented this support as of 2023, even though this is a proposed and expired standard from 2008: https://groups.google.com/g/public-dns-discuss/c/KxIDPOydA5M There have been documented cases of name server and caching server non-conformance, though it is expected to become more rare, especially since Google has started using this. This can be enabled via the `ARES_FLAG_DNS0x20` flag, which is currently disabled by default. The test cases do however enable this flag to validate this feature. Implementors using this flag will notice that responses will retain the mixed case, but since DNS names are case-insensitive, any proper implementation should not be impacted. There is currently no fallback mechanism implemented as it isn't immediately clear how this may affect a stub resolver like c-ares where we aren't querying the authoritative name server, but instead an intermediate recursive resolver where some domains may return invalid results while others return valid results, all while querying the same nameserver. Likely using DNS cookies as suggested by #620 is a better mechanism to fight cache poisoning attacks for stub resolvers. TCP queries do not use this feature even if the `ARES_FLAG_DNS0x20` flag is specified since they are not subject to cache poisoning attacks. Fixes Issue: #795 Fix By: Brad House (@bradh352)	5 months ago
Brad House	1dc567d9c0	fix wording in docs	5 months ago
Brad House	209e7077bb	thread safety enhancements	5 months ago
Brad House	93aa939169	thread deadlock: make sure channel lock isn't used recursively	5 months ago
Brad House	a488525f08	Automatic query timeout adjustment based on server history (#794 ) With very little effort we should be able to determine fairly proper timeouts we can use based on prior query history. We track in order to be able to auto-scale when network conditions change (e.g. maybe there is a provider failover and timings change due to that). Apple appears to do this within their system resolver in MacOS. Obviously we should have a minimum, maximum, and initial value to make sure the algorithm doesn't somehow go off the rails. Values: - Minimum Timeout: 250ms (approximate RTT half-way around the globe) - Maximum Timeout: 5000ms (Recommended timeout in RFC 1123), can be reduced by ARES_OPT_MAXTIMEOUTMS, but otherwise the bound specified by the option caps the retry timeout. - Initial Timeout: User-specified via configuration or ARES_OPT_TIMEOUTMS - Average latency multiplier: 5x (a local DNS server returning a cached value will be quicker than if it needs to recurse so we need to account for this) - Minimum Count for Average: 3. This is the minimum number of queries we need to form an average for the bucket. Per-server buckets for tracking latency over time (these are ephemeral meaning they don't persist once a channel is destroyed). We record both the current timespan for the bucket and the immediate preceding timespan in case of roll-overs we can still maintain recent metrics for calculations: - 1 minute - 15 minutes - 1 hr - 1 day - since inception Each bucket contains: - timestamp (divided by interval) - minimum latency - maximum latency - total time - count NOTE: average latency is (total time / count), we will calculate this dynamically when needed Basic algorithm for calculating timeout to use would be: - Scan from most recent bucket to least recent - Check timestamp of bucket, if doesn't match current time, continue to next bucket - Check count of bucket, if its not at least the "Minimum Count for Average", check the previous bucket, otherwise continue to next bucket - If we reached the end with no bucket match, use "Initial Timeout" - If bucket is selected, take ("total time" / count) as Average latency, multiply by "Average Latency Multiplier", bound by "Minimum Timeout" and "Maximum Timeout" NOTE: The timeout calculated may not be the timeout used. If we are retrying the query on the same server another time, then it will use a larger value On each query reply where the response is legitimate (proper response or NXDOMAIN) and not something like a server error: - Cycle through each bucket in order - Check timestamp of bucket against current timestamp, if out of date overwrite previous entry with values, clear current values - Compare current minimum and maximum recorded latency against query time and adjust if necessary - Increment "count" by 1 and "total time" by the query time Other Notes: - This is always-on, the only user-configurable value is the initial timeout which will simply re-uses the current option. - Minimum and Maximum latencies for a bucket are currently unused but are there in case we find a need for them in the future. Fixes Issue: #736 Fix By: Brad House (@bradh352)	5 months ago
Brad House	4248c642d2	Enable QueryCache by default (#786 ) The query cache should be enabled by default. This will help with determining proper timeouts for #736. It can still be disabled by setting the ttl to 0. There should be no negative consequences of this in real-world scenarios since DNS is based on the TTL concept and upstream servers will cache results and not recurse based on this information anyhow. DNS queries and responses are very small, this should have negligible impact on memory consumption. Fix By: Brad House (@bradh352)	5 months ago
Brad House	8d80486e04	Auto reload config on changes (requires EventThread) (#759 ) Automatically detect configuration changes and reload. On systems which provide notification mechanisms, use those, otherwise fallback to polling. When a system configuration change is detected, it asynchronously applies the configuration in order to ensure it is a non-blocking operation for any queries which may still be being processed. On Windows, however, changes aren't detected if a user manually sets/changes the DNS servers on an interface, it doesn't appear there is any mechanism capable of this. We are relying on `NotifyIpInterfaceChange()` for notifications. Fixes Issue: #613 Fix By: Brad House (@bradh352)	6 months ago
Brad House	a2efab6c75	manpage: remove AUTHOR section The current best practices consider the AUTHOR section to be deprecated and recommend removing such a section.	7 months ago
Oliver Welsh	fd81f36d3e	Add server failover retry behavior, where failed servers are retried with small probability after a minimum delay (#731 ) Summary By default c-ares will select the server with the least number of consecutive failures when sending a query. However, this means that if a server temporarily goes down and hits failures (e.g. a transient network issue), then that server will never be retried until all other servers hit the same number of failures. This is an issue if the failed server is preferred to other servers in the list. For example if a primary server and a backup server are configured. This PR adds new server failover retry behavior, where failed servers are retried with small probability after a minimum delay has passed. The probability and minimum delay are configurable via the `ARES_OPT_SERVER_FAILOVER` option. By default c-ares will use a probability of 10% and a minimum delay of 5 seconds. In addition, this PR includes a small change to always close out connections to servers which have hit failures, even with `ARES_FLAG_STAYOPEN`. It's possible that resetting the connection can resolve some server issues (e.g. by resetting the source port). Testing A new set of regression tests have been added to test the new server failover retry behavior. Fixes Issue: #717 Fix By: Oliver Welsh (@oliverwelsh)	7 months ago
Brad House	458c937213	Allow configuration value for NDots to be zero (#735 ) As per Issue #734 some people use `ndots:0` in their configuration which is allowed by the system resolver but not by c-ares. Add support for `ndots:0` and add a test case to validate this behavior. Fixes Issue: #734 Fix By: Brad House (@bradh352)	8 months ago
Oliver Welsh	035c4c3776	Add flag to not use a default local named server on channel initialization (#713 ) Hello, I work on an application for Microsoft which uses c-ares to perform DNS lookups. We have made some minor changes to the library over time, and would like to contribute these back to the project in case they are useful more widely. This PR adds a new channel init flag, described below. Please let me know if I can include any more information to make this PR better/easier for you to review. Thanks! Summary When initializing a channel with `ares_init_options()`, if there are no nameservers available (because `ARES_OPT_SERVERS` is not used and `/etc/resolv.conf` is either empty or not available) then a default local named server will be added to the channel. However in some applications a local named server will never be available. In this case, all subsequent queries on the channel will fail. If we know this ahead of time, then it may be preferred to fail channel initialization directly rather than wait for the queries to fail. This gives better visibility, since we know that the failure is due to missing servers rather than something going wrong with the queries. This PR adds a new flag `ARES_FLAG_NO_DFLT_SVR`, to indicate that a default local named server should not be added to a channel in this scenario. Instead, a new error `ARES_EINITNOSERVER` is returned and initialization fails. Testing I have added 2 new FV tests: - `ContainerNoDfltSvrEmptyInit` to test that initialization fails when no nameservers are available and the flag is set. - `ContainerNoDfltSvrFullInit` to test that initialization still succeeds when the flag is set but other nameservers are available. Existing FVs are all passing. Documentation I have had a go at manually updating the docs to describe the new flag/error, but couldn't see any contributing guidance about testing this. Please let me know if you'd like anything more here. --------- Fix By: Oliver Welsh (@oliverwelsh)	9 months ago
Brad House	58e029f332	make docs match PR #705	10 months ago
Andriy Utkin	71b413d804	docs/ares_init_options.3: fix args in analogy (#701 ) Fix By: Andriy Utkin <hello@autkin.net>	10 months ago
Brad House	7963c519fc	Event Subsystem: No longer require integrators to have their own (#696 ) This PR implements an event thread to process all events on file descriptors registered by c-ares. Prior to this feature, integrators were required to understand the internals of c-ares and how to monitor file descriptors and timeouts and process events. Implements OS-specific efficient polling such as epoll(), kqueue(), or IOCP, and falls back to poll() or select() if otherwise unsupported. At this point, it depends on basic threading primitives such as pthreads or windows threads. If enabled via the ARES_OPT_EVENT_THREAD option passed to ares_init_options(), then socket callbacks cannot be used. Fixes Bug: #611 Fix By: Brad House (@bradh352)	10 months ago
Brad House	0529f6f1dc	fix doc typo	1 year ago
Brad House	a9442bd828	Basic Thread Safety (#636 ) c-ares does not have any concept of thread-safety. It has always been 100% up to the implementor to ensure they never call c-ares from more than one thread at a time. This patch adds basic thread-safety support, which can be disabled at compile time if not desired. It uses a single recursive mutex per channel, which should be extremely quick when uncontested so overhead should be minimal. Fixes Bug: #610 Also sets the stage to implement #611 Fix By: Brad House (@bradh352)	1 year ago
Christian Clauss	054f474a29	Fix typos discovered by codespell (#634 ) % `codespell --ignore-words-list="aas,aci,acter,atleast,contentss,firey,fo,sais,seh,statics"` * https://pypi.org/project/codespell Fix By: Christian Clauss (@cclauss)	1 year ago
Brad House	4982f76a2f	Query Cache support (#625 ) This PR implements a query cache at the lowest possible level, the actual dns request and response messages. Only successful and `NXDOMAIN` responses are cached. The lowest TTL in the response message determines the cache validity period for the response, and is capped at the configuration value for `qcache_max_ttl`. For `NXDOMAIN` responses, the SOA record is evaluated. For a query to match the cache, the opcode, flags, and each question's class, type, and name are all evaluated. This is to prevent matching a cached entry for a subtly different query (such as if the RD flag is set on one request and not another). For things like ares_getaddrinfo() or ares_search() that may spawn multiple queries, each individual message received is cached rather than the overarching response. This makes it possible for one query in the sequence to be purged from the cache while others still return cached results which means there is no chance of ever returning stale data. We have had a lot of user requests to return TTLs on all the various parsers like `ares_parse_caa_reply()`, and likely this is because they want to implement caching mechanisms of their own, thus this PR should solve those issues as well. Due to the internal data structures we have these days, this PR is less than 500 lines of new code. Fixes #608 Fix By: Brad House (@bradh352)	1 year ago
Brad House	5159314031	Release 1.22.0 (#616 )	1 year ago
Brad House	4acd5759e9	Slight fixes for PR #615 1. the maxtimeout must come at the end of the structure 2. fix comment form to be C style 3. fix timeplus randomness if statement	1 year ago
Ignat	7a140cb478	Randomize retry penalties to prevent thundering herd type issues (#606 ) The retry timeout values were using a fixed calculation which could cause multiple simultaneous queries to timeout and retry at the exact same time. If a DNS server is throttling requests, this could cause the issue to never self-resolve due to all requests recurring at the same instance again. This PR also creates a maximum timeout option to make sure the random value selected does not exceed this value. Fix By: Ignat (@Kontakter)	1 year ago
Brad House	9037340ef6	Mark a couple of parameters as const in the public API	1 year ago
Brad House	4ec6b5ce4c	Use EDNS by default (#596 ) All DNS servers support EDNS, by using this by default, it will allow larger responses without the need to switch to TCP. If by chance a DNS server is hit that doesn't support EDNS, this is detected due to the lack of the OPT RR in the response and will be automatically retried without EDNS. Fix By: Brad House (@bradh352)	1 year ago
Brad House	d2389cd3b7	`ares_channel` -> `ares_channel_t `: don't bury the pointer (#595 ) `ares_channel` is defined as `typedef struct ares_channeldata ares_channel;`. The problem with this, is it embeds the pointer into the typedef, which means an `ares_channel` can never be declared as `const` as if you write `const ares_channel channel`, that expands to `struct ares_channeldata * const ares_channel` and not `const struct ares_channeldata channel`. We will now typedef `ares_channel_t` as `typedef struct ares_channeldata ares_channel_t;`, so if you write `const ares_channel_t channel`, it properly expands to `const struct ares_channeldata channel`. We are maintaining the old typedef for API compatibility with existing integrations, and due to typedef expansion this should not even cause any compiler warnings for existing code. There are no ABI implications with this change. I could be convinced to keep existing public functions as `ares_channel` if a sufficient argument exists, but internally we really need make this change for modern best practices. This change will allow us to internally use `const ares_channel_t ` where appropriate. Whether or not we decide to change any public interfaces to use `const` may require further discussion on if there might be ABI implications (I don't think so, but I'm also not 100% sure what a compiler internally does with `const` when emitting machine code ... I think more likely ABI implications would occur going the opposite direction). FYI, This PR was done via a combination of sed and clang-format, the only manual code change was the addition of the new typedef, and a couple doc fixes :) Fix By: Brad House (@bradh352)	1 year ago
Daniel Stenberg	125c8a1684	docs: provide better man page references When referring to another c-ares function use \fI function(3) \fP to let the webpage rendering find and cross-link them appropriately. SEE ALSO references should be ".BR name (3),", with a space before the open parenthesis. This helps the manpage to HTML renderer. Closes #565	1 year ago
Brad House	dd93f30082	Configuration option to limit number of UDP queries per ephemeral port (#549 ) Add a new ARES_OPT_UDP_MAX_QUERIES option with udp_max_queries parameter that can be passed to ares_init_options(). This value defaults to 0 (unlimited) to maintain existing compatibility, any positive number will cause new UDP ephemeral ports to be created once the threshold is reached, we'll call these 'connections' even though its technically wrong for UDP. Implementation Details: * Each server entry in a channel now has a linked-list of connections/ports for udp and tcp. The first connection in the list is the one most likely to be eligible to accept new queries. * Queries are now tracked by connection rather than by server. * Every time a query is detached from a connection, the connection that it was attached to will be checked to see if it needs to be cleaned up. * Insertion, lookup, and searching for connections has been implemented as O(1) complexity so the number of connections will not impact performance. * Remove is_broken from the server, it appears it would be set and immediately unset, so must have been invalidated via a prior patch. A future patch should probably track consecutive server errors and de-prioritize such servers. The code right now will always try servers in the order of configuration, so a bad server in the list will always be tried and may rely on timeout logic to try the next. * Various other cleanups to remove code duplication and for clarification. Fixes Bug: #444 Fix By: Brad House (@bradh352)	1 year ago
Brad House	7f3262312f	its not 1991 anymore, lower default timeout and retry count (#542 ) A lot of time has passed since the original timeouts and retry counts were chosen. We have on and off issues reported due to this. Even on geostationary satellite links, latency is worst case around 1.5s. This PR changes the per-server timeout to 2s and the retry count lowered from 4 to 3. Fix By: Brad House (@bradh352)	1 year ago
Daniel Stenberg	c1b00c41a7	provide SPDX identifiers and a REUSE CI job to verify All files have their licence and copyright information clearly identifiable. If not in the file header, they are set separately in .reuse/dep5. All used license texts are provided in LICENSES/	1 year ago
Yijie Ma	82c23e4e7e	Fix a typo in ares_init_options.3 (#510 ) that -> than Fix By: Yijie Ma (@yijiem)	2 years ago
bradh352	a306ed4238	docs: ARES_OPT_UDP_PORT and ARES_OPT_TCP_PORT docs wrong byte order As per #487, documentation states the port should be in network byte order, but we can see from the test cases using MockServers on different ports that this is not the case, it is definitely in host byte order. Fix By: Brad House (@bradh352)	2 years ago
Manish Mehra	810c2322f9	Configurable hosts path for file_lookup (#465 ) This changeset adds support for configurable hosts file ARES_OPT_HOSTS_FILE (similar to ARES_OPT_RESOLVCONF). Co-authored-by: Manish Mehra (@mmehra)	3 years ago
Brad House	0bf721cdd7	Reorganize source tree (#349 ) Originally started by Daniel Stenberg (@bagder) with #123, this patch reorganizes the c-ares source tree to have a more modern layout. It also fixes out of tree builds for autotools, and automatically builds the tests if tests are enabled. All tests are passing which tests each of the supported build systems (autotools, cmake, nmake, mingw gmake). There may be some edge cases that will have to be caught later on for things I'm not aware of. Fix By: Brad House (@bradh352)	4 years ago
Fionn Fitzmaurice	6d6cd5daf6	Avoid buffer overflow in RC4 loop comparison (#336 ) The rc4 function iterates over a buffer of size buffer_len who's maximum value is INT_MAX with a counter of type short that is not guaranteed to have maximum size INT_MAX. In circumstances where short is narrower than int and where buffer_len is larger than the maximum value of a short, it may be possible to loop infinitely as counter will overflow and never be greater than or equal to buffer_len. The solution is to make the comparison be between types of equal width. This commit defines counter as an int. Fix By: Fionn Fitzmaurice (@fionn)	4 years ago

32 Commits (c9c235761f68f91269385bb6bc12afdcbad49dc2)