mirror of https://github.com/c-ares/c-ares.git
Automatic query timeout adjustment based on server history (#794)
With very little effort we should be able to determine fairly proper timeouts we can use based on prior query history. We track in order to be able to auto-scale when network conditions change (e.g. maybe there is a provider failover and timings change due to that). Apple appears to do this within their system resolver in MacOS. Obviously we should have a minimum, maximum, and initial value to make sure the algorithm doesn't somehow go off the rails. Values: - Minimum Timeout: 250ms (approximate RTT half-way around the globe) - Maximum Timeout: 5000ms (Recommended timeout in RFC 1123), can be reduced by ARES_OPT_MAXTIMEOUTMS, but otherwise the bound specified by the option caps the retry timeout. - Initial Timeout: User-specified via configuration or ARES_OPT_TIMEOUTMS - Average latency multiplier: 5x (a local DNS server returning a cached value will be quicker than if it needs to recurse so we need to account for this) - Minimum Count for Average: 3. This is the minimum number of queries we need to form an average for the bucket. Per-server buckets for tracking latency over time (these are ephemeral meaning they don't persist once a channel is destroyed). We record both the current timespan for the bucket and the immediate preceding timespan in case of roll-overs we can still maintain recent metrics for calculations: - 1 minute - 15 minutes - 1 hr - 1 day - since inception Each bucket contains: - timestamp (divided by interval) - minimum latency - maximum latency - total time - count NOTE: average latency is (total time / count), we will calculate this dynamically when needed Basic algorithm for calculating timeout to use would be: - Scan from most recent bucket to least recent - Check timestamp of bucket, if doesn't match current time, continue to next bucket - Check count of bucket, if its not at least the "Minimum Count for Average", check the previous bucket, otherwise continue to next bucket - If we reached the end with no bucket match, use "Initial Timeout" - If bucket is selected, take ("total time" / count) as Average latency, multiply by "Average Latency Multiplier", bound by "Minimum Timeout" and "Maximum Timeout" NOTE: The timeout calculated may not be the timeout used. If we are retrying the query on the same server another time, then it will use a larger value On each query reply where the response is legitimate (proper response or NXDOMAIN) and not something like a server error: - Cycle through each bucket in order - Check timestamp of bucket against current timestamp, if out of date overwrite previous entry with values, clear current values - Compare current minimum and maximum recorded latency against query time and adjust if necessary - Increment "count" by 1 and "total time" by the query time Other Notes: - This is always-on, the only user-configurable value is the initial timeout which will simply re-uses the current option. - Minimum and Maximum latencies for a bucket are currently unused but are there in case we find a need for them in the future. Fixes Issue: #736 Fix By: Brad House (@bradh352)pull/797/head
parent
0b50983539
commit
a488525f08
6 changed files with 358 additions and 40 deletions
@ -0,0 +1,260 @@ |
|||||||
|
/* MIT License
|
||||||
|
* |
||||||
|
* Copyright (c) 2024 Brad House |
||||||
|
* |
||||||
|
* Permission is hereby granted, free of charge, to any person obtaining a copy |
||||||
|
* of this software and associated documentation files (the "Software"), to deal |
||||||
|
* in the Software without restriction, including without limitation the rights |
||||||
|
* to use, copy, modify, merge, publish, distribute, sublicense, and/or sell |
||||||
|
* copies of the Software, and to permit persons to whom the Software is |
||||||
|
* furnished to do so, subject to the following conditions: |
||||||
|
* |
||||||
|
* The above copyright notice and this permission notice (including the next |
||||||
|
* paragraph) shall be included in all copies or substantial portions of the |
||||||
|
* Software. |
||||||
|
* |
||||||
|
* THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR |
||||||
|
* IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, |
||||||
|
* FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE |
||||||
|
* AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER |
||||||
|
* LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, |
||||||
|
* OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE |
||||||
|
* SOFTWARE. |
||||||
|
* |
||||||
|
* SPDX-License-Identifier: MIT |
||||||
|
*/ |
||||||
|
|
||||||
|
|
||||||
|
/* IMPLEMENTATION NOTES
|
||||||
|
* ==================== |
||||||
|
* |
||||||
|
* With very little effort we should be able to determine fairly proper timeouts |
||||||
|
* we can use based on prior query history. We track in order to be able to |
||||||
|
* auto-scale when network conditions change (e.g. maybe there is a provider |
||||||
|
* failover and timings change due to that). Apple appears to do this within |
||||||
|
* their system resolver in MacOS. Obviously we should have a minimum, maximum, |
||||||
|
* and initial value to make sure the algorithm doesn't somehow go off the |
||||||
|
* rails. |
||||||
|
* |
||||||
|
* Values: |
||||||
|
* - Minimum Timeout: 250ms (approximate RTT half-way around the globe) |
||||||
|
* - Maximum Timeout: 5000ms (Recommended timeout in RFC 1123), can be reduced |
||||||
|
* by ARES_OPT_MAXTIMEOUTMS, but otherwise the bound specified by the option |
||||||
|
* caps the retry timeout. |
||||||
|
* - Initial Timeout: User-specified via configuration or ARES_OPT_TIMEOUTMS |
||||||
|
* - Average latency multiplier: 5x (a local DNS server returning a cached value |
||||||
|
* will be quicker than if it needs to recurse so we need to account for this) |
||||||
|
* - Minimum Count for Average: 3. This is the minimum number of queries we |
||||||
|
* need to form an average for the bucket. |
||||||
|
* |
||||||
|
* Per-server buckets for tracking latency over time (these are ephemeral |
||||||
|
* meaning they don't persist once a channel is destroyed). We record both the |
||||||
|
* current timespan for the bucket and the immediate preceding timespan in case |
||||||
|
* of roll-overs we can still maintain recent metrics for calculations: |
||||||
|
* - 1 minute |
||||||
|
* - 15 minutes |
||||||
|
* - 1 hr |
||||||
|
* - 1 day |
||||||
|
* - since inception |
||||||
|
* |
||||||
|
* Each bucket would contain: |
||||||
|
* - timestamp (divided by interval) |
||||||
|
* - minimum latency |
||||||
|
* - maximum latency |
||||||
|
* - total time |
||||||
|
* - count |
||||||
|
* NOTE: average latency is (total time / count), we will calculate this |
||||||
|
* dynamically when needed |
||||||
|
* |
||||||
|
* Basic algorithm for calculating timeout to use would be: |
||||||
|
* - Scan from most recent bucket to least recent |
||||||
|
* - Check timestamp of bucket, if doesn't match current time, continue to next |
||||||
|
* bucket |
||||||
|
* - Check count of bucket, if its not at least the "Minimum Count for Average", |
||||||
|
* check the previous bucket, otherwise continue to next bucket |
||||||
|
* - If we reached the end with no bucket match, use "Initial Timeout" |
||||||
|
* - If bucket is selected, take ("total time" / count) as Average latency, |
||||||
|
* multiply by "Average Latency Multiplier", bound by "Minimum Timeout" and |
||||||
|
* "Maximum Timeout" |
||||||
|
* NOTE: The timeout calculated may not be the timeout used. If we are retrying |
||||||
|
* the query on the same server another time, then it will use a larger value |
||||||
|
* |
||||||
|
* On each query reply where the response is legitimate (proper response or |
||||||
|
* NXDOMAIN) and not something like a server error: |
||||||
|
* - Cycle through each bucket in order |
||||||
|
* - Check timestamp of bucket against current timestamp, if out of date |
||||||
|
* overwrite previous entry with values, clear current values |
||||||
|
* - Compare current minimum and maximum recorded latency against query time and |
||||||
|
* adjust if necessary |
||||||
|
* - Increment "count" by 1 and "total time" by the query time |
||||||
|
* |
||||||
|
* Other Notes: |
||||||
|
* - This is always-on, the only user-configurable value is the initial |
||||||
|
* timeout which will simply re-uses the current option. |
||||||
|
* - Minimum and Maximum latencies for a bucket are currently unused but are |
||||||
|
* there in case we find a need for them in the future. |
||||||
|
*/ |
||||||
|
|
||||||
|
#include "ares_setup.h" |
||||||
|
#include "ares.h" |
||||||
|
#include "ares_private.h" |
||||||
|
|
||||||
|
/*! Minimum timeout value. Chosen due to it being approximately RTT half-way
|
||||||
|
* around the world */ |
||||||
|
#define MIN_TIMEOUT_MS 250 |
||||||
|
|
||||||
|
/*! Multiplier to apply to average latency to come up with an initial timeout */ |
||||||
|
#define AVG_TIMEOUT_MULTIPLIER 5 |
||||||
|
|
||||||
|
/*! Upper timeout bounds, only used if channel->maxtimeout not set */ |
||||||
|
#define MAX_TIMEOUT_MS 5000 |
||||||
|
|
||||||
|
/*! Minimum queries required to form an average */ |
||||||
|
#define MIN_COUNT_FOR_AVERAGE 3 |
||||||
|
|
||||||
|
static time_t ares_metric_timestamp(ares_server_bucket_t bucket, |
||||||
|
const ares_timeval_t *now, |
||||||
|
ares_bool_t is_previous) |
||||||
|
{ |
||||||
|
time_t divisor = 1; /* Silence bogus MSVC warning by setting default value */ |
||||||
|
|
||||||
|
switch (bucket) { |
||||||
|
case ARES_METRIC_1MINUTE: |
||||||
|
divisor = 60; |
||||||
|
break; |
||||||
|
case ARES_METRIC_15MINUTES: |
||||||
|
divisor = 15 * 60; |
||||||
|
break; |
||||||
|
case ARES_METRIC_1HOUR: |
||||||
|
divisor = 60 * 60; |
||||||
|
break; |
||||||
|
case ARES_METRIC_1DAY: |
||||||
|
divisor = 24 * 60 * 60; |
||||||
|
break; |
||||||
|
case ARES_METRIC_INCEPTION: |
||||||
|
return is_previous?0:1; |
||||||
|
case ARES_METRIC_COUNT: |
||||||
|
return 0; /* Invalid! */ |
||||||
|
} |
||||||
|
|
||||||
|
if (is_previous) { |
||||||
|
if (divisor >= now->sec) { |
||||||
|
return 0; |
||||||
|
} |
||||||
|
return (time_t)((now->sec - divisor) / divisor); |
||||||
|
} |
||||||
|
|
||||||
|
return (time_t)(now->sec / divisor); |
||||||
|
} |
||||||
|
|
||||||
|
void ares_metrics_record(const struct query *query, struct server_state *server, |
||||||
|
ares_status_t status, const ares_dns_record_t *dnsrec) |
||||||
|
{ |
||||||
|
ares_timeval_t now = ares__tvnow(); |
||||||
|
ares_timeval_t tvdiff; |
||||||
|
unsigned int query_ms; |
||||||
|
ares_dns_rcode_t rcode; |
||||||
|
ares_server_bucket_t i; |
||||||
|
|
||||||
|
if (status != ARES_SUCCESS) { |
||||||
|
return; |
||||||
|
} |
||||||
|
|
||||||
|
if (server == NULL) { |
||||||
|
return; |
||||||
|
} |
||||||
|
|
||||||
|
rcode = ares_dns_record_get_rcode(dnsrec); |
||||||
|
if (rcode != ARES_RCODE_NOERROR && rcode != ARES_RCODE_NXDOMAIN) { |
||||||
|
return; |
||||||
|
} |
||||||
|
|
||||||
|
ares__timeval_diff(&tvdiff, &query->ts, &now); |
||||||
|
query_ms = (unsigned int)(tvdiff.sec + (tvdiff.usec / 1000)); |
||||||
|
if (query_ms == 0) { |
||||||
|
query_ms = 1; |
||||||
|
} |
||||||
|
|
||||||
|
/* Place in each bucket */ |
||||||
|
for (i=0; i<ARES_METRIC_COUNT; i++) { |
||||||
|
time_t ts = ares_metric_timestamp(i, &now, ARES_FALSE); |
||||||
|
|
||||||
|
/* Copy metrics to prev and clear */ |
||||||
|
if (ts != server->metrics[i].ts) { |
||||||
|
server->metrics[i].prev_ts = server->metrics[i].ts; |
||||||
|
server->metrics[i].prev_total_ms = server->metrics[i].total_ms; |
||||||
|
server->metrics[i].prev_total_count = server->metrics[i].total_count; |
||||||
|
server->metrics[i].ts = ts; |
||||||
|
server->metrics[i].latency_min_ms = 0; |
||||||
|
server->metrics[i].latency_max_ms = 0; |
||||||
|
server->metrics[i].total_ms = 0; |
||||||
|
server->metrics[i].total_count = 0; |
||||||
|
} |
||||||
|
|
||||||
|
if (server->metrics[i].latency_min_ms == 0 || |
||||||
|
server->metrics[i].latency_min_ms > query_ms) { |
||||||
|
server->metrics[i].latency_min_ms = query_ms; |
||||||
|
} |
||||||
|
|
||||||
|
if (query_ms > server->metrics[i].latency_max_ms) { |
||||||
|
server->metrics[i].latency_min_ms = query_ms; |
||||||
|
} |
||||||
|
|
||||||
|
server->metrics[i].total_count++; |
||||||
|
server->metrics[i].total_ms += (ares_uint64_t)query_ms; |
||||||
|
} |
||||||
|
} |
||||||
|
|
||||||
|
size_t ares_metrics_server_timeout(const struct server_state *server, |
||||||
|
const ares_timeval_t *now) |
||||||
|
{ |
||||||
|
const ares_channel_t *channel = server->channel; |
||||||
|
ares_server_bucket_t i; |
||||||
|
size_t timeout_ms = 0; |
||||||
|
|
||||||
|
|
||||||
|
for (i=0; i<ARES_METRIC_COUNT; i++) { |
||||||
|
time_t ts = ares_metric_timestamp(i, now, ARES_FALSE); |
||||||
|
|
||||||
|
/* This ts has been invalidated, see if we should use the previous
|
||||||
|
* time period */ |
||||||
|
if (ts != server->metrics[i].ts || |
||||||
|
server->metrics[i].total_count < MIN_COUNT_FOR_AVERAGE) { |
||||||
|
time_t prev_ts = ares_metric_timestamp(i, now, ARES_TRUE); |
||||||
|
if (prev_ts != server->metrics[i].prev_ts || |
||||||
|
server->metrics[i].prev_total_count < MIN_COUNT_FOR_AVERAGE) { |
||||||
|
/* Move onto next bucket */ |
||||||
|
continue; |
||||||
|
} |
||||||
|
/* Calculate average time for previous bucket */ |
||||||
|
timeout_ms = (size_t)(server->metrics[i].prev_total_ms / server->metrics[i].prev_total_count); |
||||||
|
} else { |
||||||
|
/* Calculate average time for current bucket*/ |
||||||
|
timeout_ms = (size_t)(server->metrics[i].total_ms / server->metrics[i].total_count); |
||||||
|
} |
||||||
|
|
||||||
|
/* Multiply average by constant to get timeout value */ |
||||||
|
timeout_ms *= AVG_TIMEOUT_MULTIPLIER; |
||||||
|
break; |
||||||
|
} |
||||||
|
|
||||||
|
/* If we're here, that means its the first query for the server, so we just
|
||||||
|
* use the initial default timeout */ |
||||||
|
if (timeout_ms == 0) { |
||||||
|
timeout_ms = channel->timeout; |
||||||
|
} |
||||||
|
|
||||||
|
/* don't go below lower bounds */ |
||||||
|
if (timeout_ms < MIN_TIMEOUT_MS) { |
||||||
|
timeout_ms = MIN_TIMEOUT_MS; |
||||||
|
} |
||||||
|
|
||||||
|
/* don't go above upper bounds */ |
||||||
|
if (channel->maxtimeout && timeout_ms > channel->maxtimeout) { |
||||||
|
timeout_ms = (size_t)channel->maxtimeout; |
||||||
|
} else if (timeout_ms > MAX_TIMEOUT_MS) { |
||||||
|
timeout_ms = MAX_TIMEOUT_MS; |
||||||
|
} |
||||||
|
|
||||||
|
return timeout_ms; |
||||||
|
} |
Loading…
Reference in new issue