New DNS record parsing code. The old code was basically just some helper macros and functions for parsing an entire DNS message. The caller had to know the RFCs to use the parsers, except for some pre-made exceptions. The new parsing code parses the entire DNS message into an opaque data structure in a memory safe manner with various accessors for reading and manipulating the data.
The existing parser helpers for the various record types were reimplemented as wrappers around the new parser.
The accessors allow easy iteration across the DNS record datastructure, and can be used to easily create dig-like output without needing to know anything about the various record types and formats as dynamic helpers are provided for enumeration of values and data types of those values.
At some point in the future, this new DNS record structure, accessors, and parser will be exposed publicly. This is not done at this point as we don't want to do that until the API is completely stable. Likely a write() function to output the DNS record back into an actual message buffer will be introduced with the stable API as well.
Some subtle bugs in the existing code were uncovered, some which had test cases which turned out to be bogus. Validation with third-party implementations (e.g. BIND9) were performed to validate such cases were indeed bugs.
Adding additional RR parsers such as for TLSA (#470) or SVCB/HTTPS (#566) are trivial now since focus can be put on only parsing the data within the RR, not the entire message. That said, as the new parser is not yet public, it isn't clear the best way to expose any new RRs (probably best to wait for the new parser to be public rather than hacking in another legacy function).
Some additional RRs that are part of DNS RFC1035 or EDNS RFC6891 that didn't have previously implemented parsers are now also implemented (e.g. HINFO, OPT). Any unrecognized RRs are encapsulated into a "RAW_RR" as binary data which can be inserted or extracted, but are otherwise not interpreted in any way.
Fix By: Brad House (@bradh352)
c-ares uses multiple code styles, standardize on one. Talking with @bagder he feels strongly about maintaining an 80 column limit, but feels less strongly about things I feel strongly about (like alignment).
Can re-run the formatter on the codebase via:
```
clang-format -i */*.c */*.h */*/*.c */*/*.h
```
Fix By: Brad House (@bradh352)
SonarCloud is outputting some code smells for things that aren't possible for C89. Hopefully setting the code standard to C89/C90 properly will fix those bogus warnings.
Fix By: Brad House (@bradh352)
Create ares_strlen() and ares_strcpy() in order to resolve SonarCloud codesmells related to their use.
ares_strlen() just becomes null-safe.
ares_strcpy() is equivalent to strlcpy(), so unlike strncpy() it guarantees NULL termination.
Fix By: Brad House (@bradh352)
PR #568 increased the warning levels and c-ares code emitted a bunch of warnings. This PR fixes those warnings and starts transitioning internal data types into more proper forms (e.g. data lengths should be size_t not int). It does, however, have to manually cast back to what the public API needs due to API and ABI compliance (we aren't looking to break integrations, just clean up internals).
Fix By: Brad House (@bradh352)
c-ares was missing a couple of common compiler warnings during building that are widely recognized as a best practice. This PR makes no code changes, only build system changes to increase warning levels.
This PR does cause some new warnings to be emitted, a follow-up PR will address those.
Fix By: Brad House (@bradh352)
c-ares currently uses int for boolean, which can be confusing as there are some functions which return int but use '0' as the success condition. Some internal variable usage is similar. Lets try to identify the boolean use cases and split them out into their own data type of ares_bool_t. Since we're trying to keep C89 compatibility, we can't rely on stdbool.h or the _Bool C99 data type, so we'll define our own.
Also, chose using an enum rather than say unsigned char or int because of the type safety benefits it provides. Compilers should warn if you try to pass, ARES_TRUE on to a ares_status_t enum (or similar) since they are different enums.
Fix By: Brad House (@bradh352)
A regression was introduced in 1.20.0 that would pass SOCK_STREAM on udp
connections due to code refactoring. If a client application validated this
data, it could cause issues as seen in gRPC.
Fixes Issue: #571
Fix By: Brad House (@bradh352)
In an attempt to see if ares_getsock() was broken as per #571, do
further sanity checks of the results of ares_getsock(). It seems
as though ares_getsock() is fine.
Fix By: Brad House (@bradh352)
If a flag is set to keep the connections to the DNS servers open even if there are no queries, the tools would not exit until the remote server closed the connection due to the user of ares_fds() to determine if there are any active queries. Instead, rely on ares_timeout() returning NULL if there are no active queries (technically this returns the value passed to max_tv in ares_timeout(), but in our use case, that is always NULL).
Fixes Issue: #452
Fix By: Brad House (@bradh352)
The list of possible error codes in c-ares was a #define list. This not only doesn't provide for any sort of type safety but it also lacks clarification on what a function may return or what it takes, as an int could be an ares status, a boolean, or possibly even a length in the current code.
We are not changing any public APIs as though the C standard states the underlying size and type of an enum is int, there are compiler attributes to override this as well as compiler flags like -fshort-enums. GCC in particular is known to expand an enum's width based on the data values (e.g., it can emit a 64bit integer enum).
All internal usages should be changed by this PR, but of course, there may be some I missed.
Fix By: Brad House (@bradh352)
When referring to another c-ares function use \fI function(3) \fP to let
the webpage rendering find and cross-link them appropriately.
SEE ALSO references should be ".BR name (3),", with a space before the
open parenthesis. This helps the manpage to HTML renderer.
Closes#565
The purpose of this PR is to hopefully make the private API of this set of routines less likely to need to be changed in a future release. While this is not a public API, it could become harder in the future to change usage as it becomes more widely used within c-ares.
Fix By: Brad House (@bradh352)
ares (and thus c-ares) was originally licensed under the 1989 MIT license text:
https://fedoraproject.org/wiki/Licensing:MIT#Old_Style_(no_advertising_without_permission)
This change updates the license to the modern MIT license as recognized here:
https://opensource.org/license/mit/
care has been taken to ensure correct attributions remain for the authors contained within the copyright headers, and all authors with attributions in the headers have been contacted for approval regarding the change. Any authors which were not able to be contacted, the original copyright maintains, luckily that exists in only a single file `ares_parse_caa_reply.c` at this time.
Please see PR #556 for the documented approvals by each contributor.
Fix By: Brad House (@bradh352)
The test framework was using 100ms timeout passed to select(), and not using ares_timeout() to calculate the actual recommended value based on the queries in queue. Using ares_timeout() tests the functionality of ares_timeout() itself and will provide more responsive results.
Fix By: Brad House (@bradh352)
As per #266, TCP queries are basically broken. If we get a partial reply, things just don't work, but unlike UDP, TCP may get fragmented and we need to properly handle that.
I've started creating a basic parser/buffer framework for c-ares for memory safety reasons, but it also helps for things like this where we shouldn't be manually tracking positions and fetching only a couple of bytes at a time from a socket. This parser/buffer will be expanded and used more in the future.
This also resolves#206 by allowing NULL to be specified for some socket callbacks so they will auto-route to the built-in c-ares functions.
Fixes: #206, #266
Fix By: Brad House (@bradh352)
The acountry utility required a third party DNSBL service from nerd.dk in order to operate. That service has been offline for about a year and there is no other comparable service offering. We are keeping the code in the repository as an example, but no longer building it.
Fixes: #537
Fix By: Brad House (@bradh352)
During ares_destroy(), any outstanding queries are terminated, however ares_getaddrinfo() had an ordering issue with status codes which in some circumstances could lead to a new query being enqueued rather than honoring the termination.
Fixes#532
Fix By: @Chilledheart and Brad House (@bradh352)
As per #541, when using AF_UNSPEC with ares_getaddrinfo() (and in turn with ares_gethostbynam()) if we receive a successful response for one address class, we should not allow the other address class to continue on with retries, just return the address class we have.
This will limit the overall query time to whatever timeout remains for the pending query for the other address class, it will not, however, terminate the other query as it may still prove to be successful (possibly coming in less than a millisecond later) and we'd want that result still. It just turns off additional error processing to get the result back quicker.
Fixes Bug: #541
Fix By: Brad House (@bradh352)
Add a new ARES_OPT_UDP_MAX_QUERIES option with udp_max_queries parameter that can be passed to ares_init_options(). This value defaults to 0 (unlimited) to maintain existing compatibility, any positive number will cause new UDP ephemeral ports to be created once the threshold is reached, we'll call these 'connections' even though its technically wrong for UDP.
Implementation Details:
* Each server entry in a channel now has a linked-list of connections/ports for udp and tcp. The first connection in the list is the one most likely to be eligible to accept new queries.
* Queries are now tracked by connection rather than by server.
* Every time a query is detached from a connection, the connection that it was attached to will be checked to see if it needs to be cleaned up.
* Insertion, lookup, and searching for connections has been implemented as O(1) complexity so the number of connections will not impact performance.
* Remove is_broken from the server, it appears it would be set and immediately unset, so must have been invalidated via a prior patch. A future patch should probably track consecutive server errors and de-prioritize such servers. The code right now will always try servers in the order of configuration, so a bad server in the list will always be tried and may rely on timeout logic to try the next.
* Various other cleanups to remove code duplication and for clarification.
Fixes Bug: #444
Fix By: Brad House (@bradh352)
A lot of time has passed since the original timeouts and retry counts were chosen. We have on and off issues reported due to this. Even on geostationary satellite links, latency is worst case around 1.5s. This PR changes the per-server timeout to 2s and the retry count lowered from 4 to 3.
Fix By: Brad House (@bradh352)
c-ares currently lacks modern data structures that can make coding easier and more efficient. This PR implements a new linked list, skip list (sorted linked list), and hashtable implementation that are easy to use and hard to misuse. Though these implementations use more memory allocations than the prior implementation, the ability to more rapidly iterate on the codebase is a bigger win than any marginal performance difference (which is unlikely to be visible, modern systems are much more powerful than when c-ares was initially created).
The data structure implementation favors readability and audit-ability over performance, however using the algorithmically correct data type for the purpose should offset any perceived losses.
The primary motivation for this PR is to facilitate future implementation for Issues #444, #135, #458, and possibly #301
A couple additional notes:
The ares_timeout() function is now O(1) complexity instead of O(n) due to the use of a skiplist.
Some obscure bugs were uncovered which were actually being incorrectly validated in the test cases. These have been addressed in this PR but are not explicitly discussed.
Fixed some dead code warnings in ares_rand for systems that don't need rc4
Fix By: Brad House (@bradh352)