UDP write may fail indicating host isn't reachable (#821)

UDP is connectionless, but systems use ICMP unreachable messages to
indicate there is no ability to reach the host or port, which can result
in a `send()` returning an error like `ECONNREFUSED`. We need to handle
non-retryable codes like that to treat it as a connection failure so we
requeue any queries on that connection to another connection/server
immediately. Otherwise what happens is we just wait on the timeout to
expire which can greatly increase the time required to get a definitive
message.

This also adds a test case to verify the behavior.

Fixes #819
Fix By: Brad Houes (@bradh352)
v1.29
Brad House 4 months ago
parent 4e9b6f7806
commit 4f9ac6a4d0
  1. 23
      src/lib/ares_process.c
  2. 20
      test/ares-test-mock-et.cc

@ -1114,9 +1114,28 @@ ares_status_t ares__send_query(struct query *query, struct timeval *now)
}
conn = ares__llist_node_val(node);
if (ares__socket_write(channel, conn->fd, query->qbuf, query->qlen) == -1) {
/* FIXME: Handle EAGAIN here since it likely can happen. */
status = ARES_ESERVFAIL;
if (try_again(SOCKERRNO)) {
status = ARES_ESERVFAIL;
} else {
status = ARES_ECONNREFUSED;
}
if (status == ARES_ECONNREFUSED) {
handle_conn_error(conn, ARES_TRUE, status);
/* This query wasn't yet bound to the connection, need to manually
* requeue it and return an appropriate error */
status = ares__requeue_query(query, now, status);
if (status == ARES_ETIMEOUT) {
status = ARES_ECONNREFUSED;
}
return status;
}
/* FIXME: Handle EAGAIN here since it likely can happen. Right now we
* just requeue to a different server/connection. */
server_increment_failures(server, query->using_tcp);
status = ares__requeue_query(query, now, status);

@ -129,11 +129,25 @@ TEST_P(MockUDPEventThreadTest, BadLoopbackServerNoTimeouts) {
Process();
for (size_t i=0; i<BADLOOPBACK_TESTCNT; i++) {
EXPECT_TRUE(result[i].done_);
#if 0
/* This test relies on the ICMP unreachable packet coming back on UDP connections
* when there is no listener on the other end. Most OS's handle this properly,
* but not all. For instance, Solaris 11 seems to not be compliant (it
* does however honor it sometimes, just not always) so while we still run
* the test, we don't do a strict validation of the result.
*
* Windows also appears to have intermittent issues, AppVeyor fails but GitHub Actions
* succeeds, which seems strange. This test goes to loopback so the network
* it resides on shouldn't matter.
*
* This test is really just testing an optimization, UDP is connectionless so you
* should expect most connections to rely on timeouts and not ICMP unreachable.
*/
# if defined(__sun) || defined(_WIN32)
EXPECT_TRUE(result[i].status_ == ARES_ECONNREFUSED || result[i].status_ == ARES_ETIMEOUT || result[i].status_ == ARES_ESERVFAIL);
# else
EXPECT_EQ(ARES_ECONNREFUSED, result[i].status_);
EXPECT_EQ(0, result[i].timeouts_);
#else
EXPECT_TRUE(result[i].status_ == ARES_ECONNREFUSED || result[i].status_ == ARES_ETIMEOUT || result[i].status_ == ARES_ESERVFAIL);
#endif
}
}

Loading…
Cancel
Save