These changes fix a few warnings emitted by recent versions of MSVC when compiling with -W4. Half of the changes are in Windows-specific code, and the other half should be safe no matter the compiler or OS.
The allocation function change is probably the only one that needs explanation. MSVC gives warnings about the function pointers not being stable across DLL boundaries or something to that effect, so for Windows, I've made them be called indirectly, which at least made the compiler happy. I can't say I've tested every linking combination on Windows with them before or after the change, but it seems harmless.
Fix By: Brad Spencer @b-spencer
PR: https://github.com/c-ares/c-ares/pull/192
* Harden and rationalize c-ares timeout computation
* Remove the rand() part of the timeout calculation completely.
When c-ares sends a DNS query, it computes the timeout for that request as follows:
timeplus = channel->timeout << (query->try_count / channel->nservers);
timeplus = (timeplus * (9 + (rand () & 7))) / 16;
I see two issues with this code. Firstly, when either try_count or channel->timeout are large enough, this can end up as an illegal shift.
Secondly, the algorithm for adding the random timeout (added in 2009) is surprising. The original commit that introduced this algorithm says it was done to avoid a "packet storm". But, the algorithm appears to only reduce the timeout by an amount proportional to the scaled timeout's magnitude. It isn't clear to me that, for example, cutting a 30 second timeout almost in half to roughly 17 seconds is appropriate. Even with the default timeout of 5000 ms, this algorithm computes values between 2812 ms and 5000 ms, which is enough to cause a slightly latent DNS response to get spuriously dropped.
If preventing the timers from all expiring at the same time really is desirable, then it seems better to extend the timeout by a small factor so that the application gets at least the timeout it asked for, and maybe a little more. In my experience, this is common practice for timeouts: applications expect that a timeout will happen at or after the designated time (but not before), allowing for delay in detecting and reporting the timeout. Furthermore, it seems like the timeout shouldn't be extended by very much (we don't want a 30 second timeout changing into a 45 second timeout, either).
Consider also the documentation of channel->timeout in ares_init_options():
The number of milliseconds each name server is given to respond to a query on the first try. (After the first try, the timeout algorithm becomes more complicated, but scales linearly with the value of timeout.) The default is five seconds.
In the current implementation, even the first try does not use the value that the user supplies; it will use anywhere between 56% and 100% of that value.
The attached patch attempts to address all of these concerns without trying to make the algorithm much more sophisticated. After performing a safe shift, this patch simply adds a small random timeout to the computed value of between 0 ms and 511 ms. I could see limiting the random amount to be no greater than a proportion of the configured magnitude, but I can't see scaling the random with the overall computed timeout. As far as I understand, the goal is just to schedule retries "not at the same exact time", so a small difference seems sufficient.
UPDATE: randomization removed.
Closes PR #187
Fix by: Brad Spencer
If bad data is passed to ares_set_servers_csv() or
ares_set_servers_ports_csv() it will clear the existing channel
configured DNS servers, then a call to ares_send() will fail due
to a bad malloc which may have undefined behavior.
The fix now only clears existing servers on success. An additional
sanity check was added in ares_send() to ensure nservers >= 1 or
will result in ARES_ESERVFAIL.
Bug: https://c-ares.haxx.se/mail/c-ares-archive-2018-03/0000.shtml
Reported-by: Francisco Sedano Crippa
Some docs aren't installed or not showing up on
https://c-ares.haxx.se/docs.html
due to not being listed in Makefile.inc. Add missing docs and
ensure docs are alphabetized.
* Add Google LLC to AUTHORS.
* android: Explicitly delete all JNI local references, and cache JNI method IDs at initialization.
* android: Only return ARES_ENOTINITIALIZED on failures in initialization code.
Some android systems like ARM64 may not have the __system_property_get
symbol in libc (but still have it in the public headers). Detect this
condition at build time. The __system_property_get method of retrieving
name servers is deprecated as of Oreo so should strictly be a fallback
mechanism anyhow.