c-ares

Commit Graph

Author	SHA1	Message	Date
Brad House	ed86a80634	Some upstream DNS servers are non-compliant with EDNS options Some DNS servers don't properly ignore unknown EDNS options as the spec says they must, and instead will return EFORMERR. See discussion roughly starting here: https://github.com/alpinelinux/docker-alpine/issues/366#issuecomment-2462530681 In this case the DNS server is known to support EDNS in general (as version prior to c-ares 1.33 worked which used EDNS), but when adding the EDNS DNS Cookie extension, they return EFORMERR. This is in violation of [RFC6891 6.1.2](https://datatracker.ietf.org/doc/html/rfc6891#section-6.1.2): > Any OPTION-CODE values not understood by a responder or requestor MUST be ignored. The server in this example actual echo's back the EDNS record further causing confusion that makes you think they might understand the record. We need to catch an EFORMERR and re-attempt the query without EDNS completely since they are really non-compliant with EDNS. We may support additional EDNS extensions in the future and don't want to have to probe each individual extension with a braindead server. Fixes #911 Authored-By: Brad House (@bradh352)	3 months ago
Brad House	3899f7dba9	ares_getaddrinfo() for AF_UNSPEC should retry if ipv6 received We added an optimization to stop retries on other address classes on failures if one address class was received successfully. In production, however, some odd misconfigured use cases could mean an ipv6 address would be returned but the host was really only capable of connecting to ipv4 machines. We want to modify this optimization now to continue retries on ipv4 even if ipv6 was received, but NOT the other way around. It was always more likely that ipv6 resolution would cause the delays due to system issues, as the world still really only runs on ipv4... Authored-By: Brad House (@bradh352)	4 months ago
Brad House	e44b9163c1	try to fix compat with googletest 1.15 (#874 ) Due to running containerized tests we had to cut and paste the `TEST_P` macro definition in `googletest/include/gtest/gtest-param-test.h`, and modify it as it isn't designed to be wrapped in any way. Unfortunately this tends to change from release to release, usually in minor ways ... but also google test specifically doesn't advertise its own version so it can be hard to work around. Lets try to fix compatibility with google test 1.15 Fixes #873 Authored-By: Brad House (@bradh352)	6 months ago
Brad House	2c15a35aef	systemd-resolved handle ESERVFAIL/EREFUSED on single label names (#863 ) systemd-resolved will return `ESERVFAIL` or `EREFUSED` by default on single label domain names. See https://github.com/systemd/systemd/issues/34101 They've basically labeled this as a non-issue even though it appears to be in violation of the RFCs for downstream systems being able to support negative caching. I haven't tested with the suggested `ResolveUnicastSingleLabel=yes`, but that's off by default so its unlikely people even knows this exists. Since systemd is very prevalent these days, we really don't have much of a choice but to work around their design decisions. This PR implements this support, but only on single label names as it likely isn't valid otherwise. It also adds test cases to confirm this behavior. Fixes #852 Authored-By: Brad House (@bradh352)	6 months ago
Brad House	fe138bde53	Fix Sysconfig ndots default value and add test case (#862 ) As per #852 searching is failing, partially it is due to the ndots value not defaulting to a proper value on linux, and partially due to systemd-resolved returning the wrong error codes. This PR fixes the first issue and adds containerized test cases to validate the behavior and prevent issues in the future. Reported-By: Hans-Christian Egtvedt (@egtvedt) and Mikael Lindemann(@mikaellindemann) Authored-By: Brad House (@bradh352)	6 months ago
Brad House	9e574afc39	Blank DNS names would result in ARES_ENOMEM due to bug in query cache If a blank DNS name is used, the DNS query cache would fail due to an invalid sanity check. This can be legitimate such as: adig -t SOA . This fixes that situation as well as a few other spots that were uncovered and adds a test case to validate the behavior to ensure it won't regress in the future. Fixes #858 Reported-By: Nodar Chkuaselidze (@nodech) Authored-By: Brad House (@bradh352)	6 months ago
Brad House	d693951067	CI: Move more to GitHub actions including Containers (#842 ) GitHub actions supports running tests on various docker containers, move Ubuntu 20.04 and Alpine tests to containers. Also move iOS testing to GitHub actions since that runs on MacOS which is supported. This should take additional load off of Cirrus-CI which consumes credits like crazy. This leaves only FreeBSD and Linux ARM testing on Cirrus-CI. Authored-By: Brad House (@bradh352)	7 months ago
Brad House	0e4b735a10	Data Structure: Dynamic Array (#841 ) Create a new data structure for a basic growable indexable array, supporting the features you'd normally expect such as insert (at, last, first), remove (at, last, first), and get (at, last, first). Internally all data is stored in an appropriately sized array that can directly be returned to the caller as a C array of the data type provided. The array grows by powers of two, and has optimizations for head and tail removals. Array modifications can be risky (e.g. wrong reallocation sizes, mis-sized memory moves, out-of-bounds access), so it makes sense to have standardized code that is well tested. The arrays used by the dns record parser / writer have been converted to the new array data structure as have a few other instances. Authored-By: Brad House (@bradh352)	7 months ago
Brad House	dc423fb856	Implement TCP FastOpen (TFO) RFC7413 (#840 ) TCP Fast Open (TFO) allows TCP connection establishment in 0-RTT when a client and server have previously communicated. The SYN packet will also contain the initial data packet from the client to the server. This means there should be virtually no slowdown over UDP when both sides support TCP FastOpen, which is unfortunately not always the case. For instance, `1.1.1.1` appears to support TFO, however `8.8.8.8` does not. This implementation supports Linux, Android, FreeBSD, MacOS, and iOS. While Windows does have support for TCP FastOpen it does so via completion APIs only, and that can't be used with polling APIs like used by every other OS. We could implement it in the future if desired for those using `ARES_OPT_EVENT_THREAD`, but it would probably require adopting IOCP completely on Windows. Sysctls are required to be set appropriately: - Linux: `net.ipv4.tcp_fastopen`: - `1` = client only (typically default) - `2` = server only - `3` = client and server - MacOS: `net.inet.tcp.fastopen` - `1` = client only - `2` = server only - `3` = client and server (typically default) - FreeBSD: `net.inet.tcp.fastopen.server_enable` (boolean) and `net.inet.tcp.fastopen.client_enable` (boolean) This feature is always-on, when running on an OS with the capability enabled. Though some middleboxes have impacted end-to-end TFO and caused connectivity errors, all modern OSs perform automatic blackholing of IPs that have issues with TFO. It is not expected this to cause any issues in the modern day implementations. This will also help with improving latency for future DoT and DoH implementations. Authored-By: Brad House (@bradh352)	7 months ago
Brad House	ac33bdc7c2	Refactor connection handling (#839 ) Refactor some connection handling to reduce code duplication and to unify the TCP and UDP codepaths a bit more. This will make some future changes easier to make. This also does some structure renaming to better conform with current standards: - `struct server_state` -> `ares_server_t` - `struct server_connection` -> `ares_conn_t` - `struct query` -> `ares_query_t` Authored-by: Brad House (@bradh352)	7 months ago
Brad House	c9c235761f	server cookie wasn't being passed back due to missing length During the implementation of server cookies, test cases were missing to validate the server cookie in a prior reply was passed back, and it turns out they were not. This also adds tests for verification. Fix By: Brad House (@bradh352)	7 months ago
Brad House	4bedfd0d55	Add DNS cookie support (RFC7873 + RFC9018) (#833 ) DNS cookies are a simple form of learned mutual authentication supported by most DNS server implementations these days and can help prevent DNS Cache Poisoning attacks for clients and DNS amplification attacks for servers. Fixes #620 Fix By: Brad House (@bradh352)	7 months ago
Brad House	b827cdc52e	fix build failure in tests	7 months ago
Brad House	9db6f9dde8	tests: Disable LotsOfConnections test Disable a test meant for Windows event load testing. Its not meant to be something for general testing. Fix By: Brad House (@bradh352)	7 months ago
Brad House	5c555b7c0b	clang-format	7 months ago
Brad House	341b34d140	tests: reduce required testing time for ServerFailoverOpts on most platforms	7 months ago
Brad House	2b80c0f7d9	msvc Makefiles: Remove support for MSVC 6 and 7 since we can't target legacy Windows versions supported by those compilers anymore	7 months ago
Brad House	e84c7d1f61	test: bypass BadLoopbackServerNoTimeouts strict validation on NetBSD	7 months ago
Brad House	8a53099184	test: ServerFailoverOpts can fail on heavily loaded systems due to its reliance on sleep and time. Try to harden it a little bit	7 months ago
Brad House	130fd4794b	Reorganize source tree (#822 ) c-ares is getting larger these days and we keep adding source files to the same directory so it can be hard to differentiate core c-ares implementation from library/utility functions. Lets make some subdirectories to help with that and shuffle files around. Fix By: Brad House (@bradh352)	7 months ago
Brad House	a548eabbe6	UDP write may fail indicating host isn't reachable (#821 ) UDP is connectionless, but systems use ICMP unreachable messages to indicate there is no ability to reach the host or port, which can result in a `send()` returning an error like `ECONNREFUSED`. We need to handle non-retryable codes like that to treat it as a connection failure so we requeue any queries on that connection to another connection/server immediately. Otherwise what happens is we just wait on the timeout to expire which can greatly increase the time required to get a definitive message. This also adds a test case to verify the behavior. Fixes #819 Fix By: Brad Houes (@bradh352)	7 months ago
Brad House	529906d1cc	Prevent complex recursion during query requeing and connection cleanup c-ares utilizes recursion for some operations, and some of these processes can have unintended side effects, such as if a callback is called that then recurses into the same function. This can cause strange cleanup conditions that lead to crashes. Try to disassociate queries with connections as early as possible and move cleaning up unneeded connections to its own scan rather than trying to detect each time a query is disassociated from a connection. Fix By: Brad House (@bradh352)	7 months ago
Brad House	a588812a3a	propagate actual error condition on requeue	7 months ago
Brad House	1dcc170a81	Issue #819 : preliminary test case	7 months ago
Brad House	44f0cc7457	prevent SIGPIPE from being generated	7 months ago
Brad House	f68992a159	Propagate record duplication error code (#820 ) In c-ares 1.30.0 we started validating strings parsed are printable. This caused a regression in a pycares test case due to a wrong response code being returned as the error was being propagated from a different section of code that was assuming the only possible failure condition was out-of-memory. This PR adds a fix for this and also a test case to validate it. Ref: https://github.com/saghul/pycares/issues/200 Fix By: Brad House (@bradh352)	7 months ago
Brad House	ef6a3dfe76	CI: Add solaris (#814 )	7 months ago
Brad House	b19c186ce7	Rework WinAFD event code (#811 ) We've had reports of user-after-free type crashes in Windows cleanup code for the Event Thread. In evaluating the code, it appeared there were some memory leaks on per-connection handles that may have remained open during shutdown, while trying to resolve that it became apparent the methodology chosen may not have been the right one for interfacing with the Windows AFD system as stability issues were seen during this debugging process. Since this system is completely undocumented, there was no clear resolution path other than to switch to the other methodology which involves directly opening `\Device\Afd`, rather than spawning a "peer socket" to use to queue AFD operations. The original methodology chosen more closely resembled what is employed by [libuv](https://github.com/libuv/libuv) and given its widespread use was the reason it was used. The new methodology more closely resembles [wepoll](https://github.com/piscisaureus/wepoll). Its not clear if there are any scalability or performance advantages or disadvantages for either method. They both seem like different ways to do the same thing, but this current way does seem more stable. Fixes #798 Fix By: Brad House (@bradh352)	7 months ago
Brad House	f90a81ed81	tests: use std::chrono instead of pulling in ares__tvnow and ares__timeval_remaining (#809 ) This will allow more tests to run even when internal symbols aren't accessible. Fix By: Brad House (@bradh352)	8 months ago
Brad House	b649b85917	tests: fix compile warning	8 months ago
Brad House	614bdd88b9	Tests: fix test cleanup race condition (#803 ) There was a thread passed data for processing that was cleaned up before thread exit, and it could cause a use-after-free in the test suite. This doesn't affect c-ares. This was found during trying to reproduce #798, but appears unrelated, don't use a helper thread as it isn't necessary. Fix By: Brad House (@bradh352)	8 months ago
Brad House	378d26144d	DNS RR TXT strings should not be automatically concatenated (#801 ) As per #738, there are usecases where the DNS TXT record strings should not be concatenated like RFC 7208 indicates. We cannot break ABI with those using the new API, so we need to support retrieving the concatenated version as well as a new API to retrieve the individual strings which will be used by `ares_parse_text_reply_ext()` to restore the old behavior prior to c-ares 1.20. Fixes Issue: #738 Fix By: Brad House (@bradh352)	8 months ago
Brad House	70f10a85f3	DNS 0x20 implementation (#800 ) This PR enables DNS 0x20 as per https://datatracker.ietf.org/doc/html/draft-vixie-dnsext-dns0x20-00 . DNS 0x20 adds additional entropy to the request by randomly altering the case of the DNS question to help prevent cache poisoning attacks. Google DNS has implemented this support as of 2023, even though this is a proposed and expired standard from 2008: https://groups.google.com/g/public-dns-discuss/c/KxIDPOydA5M There have been documented cases of name server and caching server non-conformance, though it is expected to become more rare, especially since Google has started using this. This can be enabled via the `ARES_FLAG_DNS0x20` flag, which is currently disabled by default. The test cases do however enable this flag to validate this feature. Implementors using this flag will notice that responses will retain the mixed case, but since DNS names are case-insensitive, any proper implementation should not be impacted. There is currently no fallback mechanism implemented as it isn't immediately clear how this may affect a stub resolver like c-ares where we aren't querying the authoritative name server, but instead an intermediate recursive resolver where some domains may return invalid results while others return valid results, all while querying the same nameserver. Likely using DNS cookies as suggested by #620 is a better mechanism to fight cache poisoning attacks for stub resolvers. TCP queries do not use this feature even if the `ARES_FLAG_DNS0x20` flag is specified since they are not subject to cache poisoning attacks. Fixes Issue: #795 Fix By: Brad House (@bradh352)	8 months ago
Brad House	c96200353d	valgrind: fix warning in test case	8 months ago
Brad House	51ca744459	Clean up header inclusion, simplification (#797 ) The header inclusion logic in c-ares is hard to follow. Lets try to simplify the way it works to make it easier to understand and less likely to break on new code changes. There's still more work to be done, but this is a good start at simplifying things. Fix By: Brad House (@bradh352)	8 months ago
Brad House	9b6f197fec	warning in test	8 months ago
Brad House	1b8cfdedc9	build fix	8 months ago
Brad House	853244bc58	build fix	8 months ago
Brad House	54808a5190	fix comments	8 months ago
Brad House	8293a05f63	cleanup more warnings due to new compiler flags	8 months ago
Brad House	bbcb1a2bdf	clang-format	8 months ago
Brad House	827a1d523c	ares_queryloop: output server list	8 months ago
Brad House	f8d1e63840	ares_queryloop: properly capture CTRL-C and cleanup	8 months ago
Brad House	7ea18a83b3	test: clean up some minor warnings	8 months ago
Brad House	f9faa3f05c	try to work around windows ASAN issue by not using std::string::npos	8 months ago
Brad House	268092a390	MSVC: enable strict warnings (#792 ) MSVC has been building with /W3 which isn't considered a safe level for modern code. /W4 is recommended, but it too is lacking some recommended options, so we enable /W4 and also the recommended options. We do, however, have to disable a couple of options due to Windows headers not being fully compliant sometimes as well as some things we do in c-ares that it doesn't like, but aren't actually bad. Fix By: Brad House (@bradh352)	8 months ago
Brad House	4248c642d2	Enable QueryCache by default (#786 ) The query cache should be enabled by default. This will help with determining proper timeouts for #736. It can still be disabled by setting the ttl to 0. There should be no negative consequences of this in real-world scenarios since DNS is based on the TTL concept and upstream servers will cache results and not recurse based on this information anyhow. DNS queries and responses are very small, this should have negligible impact on memory consumption. Fix By: Brad House (@bradh352)	8 months ago
Brad House	f05465e59b	tests: set ndots:1 as default, don't honor system config as it may skew results	8 months ago
Brad House	c0d41d08ab	Coverage code annotations for identification of desirable paths that need testing (#775 ) Add code annotations for ignoring specific code paths for coverage calculations. The primary purpose of this is to make it easy to see the code paths that we could (and probably should) write test cases for, as these would have the most impact on delivery of a stable product. The annotations used are: `LCOV_EXCL_LINE: <designation>`, `LCOV_EXCL_START: <designation>`, `LCOV_EXCL_STOP` Unfortunately `LCOV_EXCL_BR_LINE` does not appear to be supported by coveralls as it would have been a more elegant solution over START/STOP. We specifically include the `<designation>` not just for future reference but because it makes it easy to identify in case we want to address these conditions in a different way in the future. The main areas designated for exclusion are: 1. `OutOfMemory` - these are hard to test cases, and on modern systems, are likely to never occur due to optimistic memory allocations, which can then later cause the kernel to terminate your application due to memory not actually being available. c-ares does have some testing framework for this, if we wish to expand in the future, we can easily use sed to get rid of of these annotations. 2. `DefensiveCoding` - these are impossible to reach paths at the point in time the code was written. They are there for defensive coding in case code is refactored in the future to prevent unexpected behavior. 3. `UntestablePath` - these are code paths that aren't possible to test, such as failure of a system call. 4. `FallbackCode` - This is an entire set of code that is untestable because its not able to simulate a failure of the primary path. This PR also does add some actual coverage in the test cases where it is easy to do. Fix By: Brad House (@bradh352)	8 months ago
Gregor Jasny	9d36fd2030	fix some obvious errors reported by the CLion Project Analyzer (#779 ) Fix By: Gregor Jasny (@gjasny)	9 months ago

1 2 3 4

187 Commits (9f81bf6fde0b1c1ca943e00a8e93d96f3d1aa556)