c-ares

Commit Graph

Author	SHA1	Message	Date
Brad House	8f40969d4f	fix build failure in tests	4 months ago
Brad House	23a400bcfb	tests: Disable LotsOfConnections test Disable a test meant for Windows event load testing. Its not meant to be something for general testing. Fix By: Brad House (@bradh352)	4 months ago
Brad House	e9e23d4f37	tests: reduce required testing time for ServerFailoverOpts on most platforms	4 months ago
Brad House	237894021f	test: bypass BadLoopbackServerNoTimeouts strict validation on NetBSD	4 months ago
Brad House	0810f6752d	test: ServerFailoverOpts can fail on heavily loaded systems due to its reliance on sleep and time. Try to harden it a little bit	4 months ago
Brad House	ccd11aa377	UDP write may fail indicating host isn't reachable (#821 ) UDP is connectionless, but systems use ICMP unreachable messages to indicate there is no ability to reach the host or port, which can result in a `send()` returning an error like `ECONNREFUSED`. We need to handle non-retryable codes like that to treat it as a connection failure so we requeue any queries on that connection to another connection/server immediately. Otherwise what happens is we just wait on the timeout to expire which can greatly increase the time required to get a definitive message. This also adds a test case to verify the behavior. Fixes #819 Fix By: Brad Houes (@bradh352)	4 months ago
Brad House	e8b32b864f	Prevent complex recursion during query requeing and connection cleanup c-ares utilizes recursion for some operations, and some of these processes can have unintended side effects, such as if a callback is called that then recurses into the same function. This can cause strange cleanup conditions that lead to crashes. Try to disassociate queries with connections as early as possible and move cleaning up unneeded connections to its own scan rather than trying to detect each time a query is disassociated from a connection. Fix By: Brad House (@bradh352)	4 months ago
Brad House	47be750b3a	Issue #819 : preliminary test case	4 months ago
Brad House	a9bc0a2dee	propagate actual error condition on requeue	4 months ago
Brad House	4ecf4855bb	Rework WinAFD event code (#811 ) We've had reports of user-after-free type crashes in Windows cleanup code for the Event Thread. In evaluating the code, it appeared there were some memory leaks on per-connection handles that may have remained open during shutdown, while trying to resolve that it became apparent the methodology chosen may not have been the right one for interfacing with the Windows AFD system as stability issues were seen during this debugging process. Since this system is completely undocumented, there was no clear resolution path other than to switch to the other methodology which involves directly opening `\Device\Afd`, rather than spawning a "peer socket" to use to queue AFD operations. The original methodology chosen more closely resembled what is employed by [libuv](https://github.com/libuv/libuv) and given its widespread use was the reason it was used. The new methodology more closely resembles [wepoll](https://github.com/piscisaureus/wepoll). Its not clear if there are any scalability or performance advantages or disadvantages for either method. They both seem like different ways to do the same thing, but this current way does seem more stable. Fixes #798 Fix By: Brad House (@bradh352)	4 months ago
Brad House	93a2627866	tests: use std::chrono instead of pulling in ares__tvnow and ares__timeval_remaining (#809 ) This will allow more tests to run even when internal symbols aren't accessible. Fix By: Brad House (@bradh352)	5 months ago
Brad House	614bdd88b9	Tests: fix test cleanup race condition (#803 ) There was a thread passed data for processing that was cleaned up before thread exit, and it could cause a use-after-free in the test suite. This doesn't affect c-ares. This was found during trying to reproduce #798, but appears unrelated, don't use a helper thread as it isn't necessary. Fix By: Brad House (@bradh352)	5 months ago
Brad House	70f10a85f3	DNS 0x20 implementation (#800 ) This PR enables DNS 0x20 as per https://datatracker.ietf.org/doc/html/draft-vixie-dnsext-dns0x20-00 . DNS 0x20 adds additional entropy to the request by randomly altering the case of the DNS question to help prevent cache poisoning attacks. Google DNS has implemented this support as of 2023, even though this is a proposed and expired standard from 2008: https://groups.google.com/g/public-dns-discuss/c/KxIDPOydA5M There have been documented cases of name server and caching server non-conformance, though it is expected to become more rare, especially since Google has started using this. This can be enabled via the `ARES_FLAG_DNS0x20` flag, which is currently disabled by default. The test cases do however enable this flag to validate this feature. Implementors using this flag will notice that responses will retain the mixed case, but since DNS names are case-insensitive, any proper implementation should not be impacted. There is currently no fallback mechanism implemented as it isn't immediately clear how this may affect a stub resolver like c-ares where we aren't querying the authoritative name server, but instead an intermediate recursive resolver where some domains may return invalid results while others return valid results, all while querying the same nameserver. Likely using DNS cookies as suggested by #620 is a better mechanism to fight cache poisoning attacks for stub resolvers. TCP queries do not use this feature even if the `ARES_FLAG_DNS0x20` flag is specified since they are not subject to cache poisoning attacks. Fixes Issue: #795 Fix By: Brad House (@bradh352)	5 months ago
Brad House	7ea18a83b3	test: clean up some minor warnings	5 months ago
Oliver Welsh	09e82e05a3	Improve reliability in the server retry delay regression tests (#747 ) Improve reliability in the server retry delay regression tests by increasing the retry delay and sleeping for a little more than the retry delay when attempting to force retries. This helps to account for unreliable timing (e.g. NTP slew) intermittently breaking pipelines. Fix By: Oliver Welsh (@oliverwelsh)	7 months ago
Oliver Welsh	fd81f36d3e	Add server failover retry behavior, where failed servers are retried with small probability after a minimum delay (#731 ) Summary By default c-ares will select the server with the least number of consecutive failures when sending a query. However, this means that if a server temporarily goes down and hits failures (e.g. a transient network issue), then that server will never be retried until all other servers hit the same number of failures. This is an issue if the failed server is preferred to other servers in the list. For example if a primary server and a backup server are configured. This PR adds new server failover retry behavior, where failed servers are retried with small probability after a minimum delay has passed. The probability and minimum delay are configurable via the `ARES_OPT_SERVER_FAILOVER` option. By default c-ares will use a probability of 10% and a minimum delay of 5 seconds. In addition, this PR includes a small change to always close out connections to servers which have hit failures, even with `ARES_FLAG_STAYOPEN`. It's possible that resetting the connection can resolve some server issues (e.g. by resetting the source port). Testing A new set of regression tests have been added to test the new server failover retry behavior. Fixes Issue: #717 Fix By: Oliver Welsh (@oliverwelsh)	7 months ago
Brad House	fed3559cfc	Add ares_queue_wait_empty() for use with EventThreads (#710 ) It may be useful to wait for the queue to be empty under certain conditions (mainly test cases), expose a function to efficiently do this and rework test cases to use it. Fix By: Brad House (@bradh352)	10 months ago
Brad House	0e4c0f2600	build-time disabled threads breaks c-ares (#700 ) Regression introduced in 1.26.0, building c-ares with threading disabled results in ares_init{_options}() failing. Also adds a new CI test case to prevent this regression in the future. Fixes Bug: #699 Fix By: Brad House (@bradh352)	10 months ago
Brad House	7963c519fc	Event Subsystem: No longer require integrators to have their own (#696 ) This PR implements an event thread to process all events on file descriptors registered by c-ares. Prior to this feature, integrators were required to understand the internals of c-ares and how to monitor file descriptors and timeouts and process events. Implements OS-specific efficient polling such as epoll(), kqueue(), or IOCP, and falls back to poll() or select() if otherwise unsupported. At this point, it depends on basic threading primitives such as pthreads or windows threads. If enabled via the ARES_OPT_EVENT_THREAD option passed to ares_init_options(), then socket callbacks cannot be used. Fixes Bug: #611 Fix By: Brad House (@bradh352)	10 months ago
Brad House	7dd384a99c	fix test building with symbol hiding New test cases depend on internal symbols for calculating timeouts. Disable those test features if symbol hiding is enabled. Fixes Bug: #664 Fix By: Brad House (@bradh352)	11 months ago
Brad House	972f456f28	ares_cancel() could trigger callback with wrong response code (#663 ) When doing ares_gethostbyname() or ares_getaddrinfo() with AF_UNSPEC, if ares_cancel() was called after one address class was returned but before the other address class, it would return ARES_SUCCESS rather than ARES_ECANCELLED. Test case has been added for this specific condition. Fixes Bug: #662 Fix By: Brad House (@bradh352)	11 months ago
Brad House	a47b352258	enhance timeout test case to make sure it will re-use a previously downed server	12 months ago
Brad House	1edaa44107	enhance timeout test case	12 months ago
Brad House	f24d7c9b52	increment failures on timeout (#651 ) As of c-ares 1.22.0, server timeouts were erroneously not incrementing server failures meaning the server in use wouldn't rotate. There was apparently never a test case for this condition. This PR fixes the bug and adds a test case to ensure it behaves properly. Fixes Bug: #650 Fix By: Brad House (@bradh352)	12 months ago
Brad House	5262da7e88	now that warnings are enabled on test cases, clear a bunch of warnings	1 year ago
Brad House	4982f76a2f	Query Cache support (#625 ) This PR implements a query cache at the lowest possible level, the actual dns request and response messages. Only successful and `NXDOMAIN` responses are cached. The lowest TTL in the response message determines the cache validity period for the response, and is capped at the configuration value for `qcache_max_ttl`. For `NXDOMAIN` responses, the SOA record is evaluated. For a query to match the cache, the opcode, flags, and each question's class, type, and name are all evaluated. This is to prevent matching a cached entry for a subtly different query (such as if the RD flag is set on one request and not another). For things like ares_getaddrinfo() or ares_search() that may spawn multiple queries, each individual message received is cached rather than the overarching response. This makes it possible for one query in the sequence to be purged from the cache while others still return cached results which means there is no chance of ever returning stale data. We have had a lot of user requests to return TTLs on all the various parsers like `ares_parse_caa_reply()`, and likely this is because they want to implement caching mechanisms of their own, thus this PR should solve those issues as well. Due to the internal data structures we have these days, this PR is less than 500 lines of new code. Fixes #608 Fix By: Brad House (@bradh352)	1 year ago
Brad House	0cc570eabe	Implement ares_reinit() to reload system configuration into existing channel (#614 ) This PR implements ares_reinit() to safely reload a channel's configuration even if there are existing queries. This function can be called when system configuration is detected to be changed, however since c-ares isn't thread aware, care must be taken to ensure no other c-ares calls are in progress at the time this function is called. Also, this function may update the open file descriptor list so care must also be taken to wake any event loops and reprocess the list of file descriptors. Fixes Bug #301 Fix By: Brad House (@bradh352)	1 year ago
Brad House	a116fede19	remove tests that depend on randomness	1 year ago
Brad House	c8bd83a4ca	Dynamic Server List (#594 ) This PR makes the server list a dynamic sorted list of servers. The sort order is [ consecutive failures, system config index ]. The server list can be updated via ares_set_servers_*(). Any queries currently directed to servers that are no longer in the list will be automatically re-queued to a different server. Also, any time a failure occurs on the server, the sort order of the servers will be updated so that the one with the fewest consecutive failures is chosen for the next query that goes on the wire, this way bad or non-responsive servers are automatically isolated. Since the server list is now dynamic, the tracking of query failures per server has been removed and instead is relying on the server sort order as previously described. This simplifies the logic while also reducing the amount of memory required per query. However, because of this dynamic nature, it may not be easy to determine the server attempt order for enqueued queries if there have been any failures. If using the ARES_OPT_ROTATE, this is now implemented to be a random selection of the configured servers. Since the server list is dynamic, its not possible to go to the next server as configuration could have changed between queries or attempts for the same query. Finally, this PR moved some existing functions into new files to logically separate them. This should address issues #550 and #440, while also setting the framework to implement #301. #301 needs a little more effort since it configures things other than the servers themselves (domains, search, sortlist, lookups), which need to make sure they can be safely updated. Fix By: Brad House (@bradh352)	1 year ago
Brad House	17931888ec	fix reference to freed memory (#562 ) Issue #561 shows free'd memory could be accessed in some error conditions. Fixes Issue #561 Fix By: Brad House (@bradh352)	1 year ago
Brad House	9e542a8839	reported build/test systems may timeout on intensive tests. reduce test case to still be relevant but to reduce false positive errors	1 year ago
Brad House	fab4039b9b	Fix for TCP back to back queries (#552 ) As per #266, TCP queries are basically broken. If we get a partial reply, things just don't work, but unlike UDP, TCP may get fragmented and we need to properly handle that. I've started creating a basic parser/buffer framework for c-ares for memory safety reasons, but it also helps for things like this where we shouldn't be manually tracking positions and fetching only a couple of bytes at a time from a socket. This parser/buffer will be expanded and used more in the future. This also resolves #206 by allowing NULL to be specified for some socket callbacks so they will auto-route to the built-in c-ares functions. Fixes: #206, #266 Fix By: Brad House (@bradh352)	1 year ago
Brad House	21f3b77440	ares_getaddrinfo(): Fail faster on AF_UNSPEC if we've already received one address class (#551 ) As per #541, when using AF_UNSPEC with ares_getaddrinfo() (and in turn with ares_gethostbynam()) if we receive a successful response for one address class, we should not allow the other address class to continue on with retries, just return the address class we have. This will limit the overall query time to whatever timeout remains for the pending query for the other address class, it will not, however, terminate the other query as it may still prove to be successful (possibly coming in less than a millisecond later) and we'd want that result still. It just turns off additional error processing to get the result back quicker. Fixes Bug: #541 Fix By: Brad House (@bradh352)	1 year ago
Brad House	dd93f30082	Configuration option to limit number of UDP queries per ephemeral port (#549 ) Add a new ARES_OPT_UDP_MAX_QUERIES option with udp_max_queries parameter that can be passed to ares_init_options(). This value defaults to 0 (unlimited) to maintain existing compatibility, any positive number will cause new UDP ephemeral ports to be created once the threshold is reached, we'll call these 'connections' even though its technically wrong for UDP. Implementation Details: * Each server entry in a channel now has a linked-list of connections/ports for udp and tcp. The first connection in the list is the one most likely to be eligible to accept new queries. * Queries are now tracked by connection rather than by server. * Every time a query is detached from a connection, the connection that it was attached to will be checked to see if it needs to be cleaned up. * Insertion, lookup, and searching for connections has been implemented as O(1) complexity so the number of connections will not impact performance. * Remove is_broken from the server, it appears it would be set and immediately unset, so must have been invalidated via a prior patch. A future patch should probably track consecutive server errors and de-prioritize such servers. The code right now will always try servers in the order of configuration, so a bad server in the list will always be tried and may rely on timeout logic to try the next. * Various other cleanups to remove code duplication and for clarification. Fixes Bug: #444 Fix By: Brad House (@bradh352)	1 year ago
Brad House	cf99c025cf	Modernization: Implement base data-structures and replace usage (#540 ) c-ares currently lacks modern data structures that can make coding easier and more efficient. This PR implements a new linked list, skip list (sorted linked list), and hashtable implementation that are easy to use and hard to misuse. Though these implementations use more memory allocations than the prior implementation, the ability to more rapidly iterate on the codebase is a bigger win than any marginal performance difference (which is unlikely to be visible, modern systems are much more powerful than when c-ares was initially created). The data structure implementation favors readability and audit-ability over performance, however using the algorithmically correct data type for the purpose should offset any perceived losses. The primary motivation for this PR is to facilitate future implementation for Issues #444, #135, #458, and possibly #301 A couple additional notes: The ares_timeout() function is now O(1) complexity instead of O(n) due to the use of a skiplist. Some obscure bugs were uncovered which were actually being incorrectly validated in the test cases. These have been addressed in this PR but are not explicitly discussed. Fixed some dead code warnings in ares_rand for systems that don't need rc4 Fix By: Brad House (@bradh352)	1 year ago
Daniel Stenberg	c1b00c41a7	provide SPDX identifiers and a REUSE CI job to verify All files have their licence and copyright information clearly identifiable. If not in the file header, they are set separately in .reuse/dep5. All used license texts are provided in LICENSES/	1 year ago
bradh352	04cba3fb3c	detect oddities and skip test if necessary	3 years ago
bradh352	acb66087a5	bend over backwards for testing file access, something is weird on debian	3 years ago
bradh352	f4b9a43fc8	chmod(fn, 0) is failing on debian	3 years ago
bradh352	f03c8608a4	maybe process needs to be called	3 years ago
bradh352	43d8946dd1	INSTANTIATE_TEST_CASE_P -> INSTANTIATE_TEST_SUITE_P as new convention in googletest	3 years ago
Brad House	c642b9fbb1	Reimplement ares_gethostbyname() by wrapping ares_getaddrinfo() (#428 ) ares_gethostbyname() and ares_getaddrinfo() do a lot of similar things, however ares_getaddrinfo() has some desirable behaviors that should be imported into ares_gethostbyname(). For one, it sorts the address lists for the most likely to succeed based on the current system routes. Next, when AF_UNSPEC is specified, it properly handles search lists instead of first searching all of AF_INET6 then AF_INET, since ares_gethostbyname() searches in parallel. Therefore, this PR should also resolve the issues attempted in #94. A few things this PR does: 1. ares_parse_a_reply() and ares_parse_aaaa_reply() had very similar code to translate struct ares_addrinfo into a struct hostent as well as into struct ares_addrttl/ares_addr6ttl this has been split out into helper functions of ares__addrinfo2hostent() and ares__addrinfo2addrttl() to prevent this duplicative code. 2. ares_getaddrinfo() was apparently never honoring HOSTALIASES, and this was discovered once ares_gethostbyname() was turned into a wrapper, the affected test cases started failing. 3. A slight API modification to save the query hostname into struct ares_addrinfo as the last element of name. Since this is the last element, and all user-level instances of struct ares_addrinfo are allocated internally by c-ares, this is not an ABI-breaking change nor would it impact any API compatibility. This was needed since struct hostent has an h_name element. 4. Test Framework: MockServer tests via TCP would fail if more than 1 request was received at a time which is common when ares_getaddrinfo() queries for both A and AAAA records simultaneously. Infact, this was a long standing issue in which the ares_getaddrinfo() test were bypassing TCP alltogether. This has been corrected, the message is now processed in a loop. 5. Some tests had to be updated for overall correctness as they were invalid but somehow passing prior to this change. Change By: Brad House (@bradh352)	3 years ago
bradh352	f4c079d9d0	more portability updates	4 years ago
bradh352	498ce747d3	portability updates for test cases	4 years ago
Erik Lax	0c85e62af2	Detect remote DNS server does not support EDNS as per RFC 6891 (#244 ) EDNS retry should be based on FORMERR returned without an OPT RR record as per https://tools.ietf.org/html/rfc6891#section-7 rather than just treating any unexpected error condition as a reason to disable EDNS on the channel. Fix By: Erik Lax (@eriklax)	4 years ago
Fionn Fitzmaurice	6d6cd5daf6	Avoid buffer overflow in RC4 loop comparison (#336 ) The rc4 function iterates over a buffer of size buffer_len who's maximum value is INT_MAX with a counter of type short that is not guaranteed to have maximum size INT_MAX. In circumstances where short is narrower than int and where buffer_len is larger than the maximum value of a short, it may be possible to loop infinitely as counter will overflow and never be greater than or equal to buffer_len. The solution is to make the comparison be between types of equal width. This commit defines counter as an int. Fix By: Fionn Fitzmaurice (@fionn)	4 years ago

19 Commits (v1.32)