[server] Keep server running after failed UDS `getpeername` (#35823)

Closes #35076. See discussion there for context.

Previously, if `getpeername` failed on a recently-accepted Unix domain socket, the server would shut down. It's unclear if there's any valid reason `getpeername` would fail here, we could be looking at system issues / data corruption, but it may be implementation-dependent. I did not find a man page that specified this behavior: getpeername on a server errors with `EBADF` when a client closes its end of a UDS connection; the server's fd should still be valid. I was not able to reproduce the failure on recent Linux/Mac systems where clients close their connection after a listener accepted (basic, non-gRPC test: listener accepts a connection and sleeps, client closes their end, then listener wakes up and calls getpeername).

If there are no valid reasons `getpeername` would fail in these spots, gRPC behavior is essentially undefined - we cannot defensively code around system failures in every place they may occur. If this _is_ a valid situation on some platforms, then this fix keeps the server alive as one would hope. There are arguments to be made for shutting down servers if the system is unstable, but I can't find an objectively correct behavior here. For those reasons, I think it may be best to keep the server running, which is what this PR does.

Closes #35823

COPYBARA_INTEGRATE_REVIEW=https://github.com/grpc/grpc/pull/35823 from drfloob:abort-listen-on-failed-getpeername 29e723c5bb
PiperOrigin-RevId: 604694057
pull/35821/head^2
AJ Heller 10 months ago committed by Copybara-Service
parent 3b2a749f19
commit 3ddc48bf62
  1. 1
      src/core/BUILD
  2. 13
      src/core/lib/event_engine/posix_engine/posix_engine_listener.cc
  3. 6
      src/core/lib/iomgr/tcp_server_posix.cc

@ -2188,6 +2188,7 @@ grpc_cc_library(
"posix_event_engine_tcp_socket_utils",
"socket_mutator",
"status_helper",
"strerror",
"time",
"//:event_engine_base_hdrs",
"//:exec_ctx",

@ -45,6 +45,7 @@
#include "src/core/lib/event_engine/posix_engine/tcp_socket_utils.h"
#include "src/core/lib/event_engine/tcp_socket_utils.h"
#include "src/core/lib/gprpp/status_helper.h"
#include "src/core/lib/gprpp/strerror.h"
#include "src/core/lib/gprpp/time.h"
#include "src/core/lib/iomgr/socket_mutator.h"
@ -173,7 +174,7 @@ void PosixEngineListenerImpl::AsyncConnectionAcceptor::NotifyOnAccept(
return;
default:
gpr_log(GPR_ERROR, "Closing acceptor. Failed accept4: %s",
strerror(errno));
grpc_core::StrError(errno).c_str());
// Shutting down the acceptor. Unref the ref grabbed in
// AsyncConnectionAcceptor::Start().
Unref();
@ -189,15 +190,13 @@ void PosixEngineListenerImpl::AsyncConnectionAcceptor::NotifyOnAccept(
auto listener_addr_uri = ResolvedAddressToURI(socket_.addr);
gpr_log(
GPR_ERROR,
"Failed getpeername: %s. This is a critical failure, the "
"listener on %s:%d is shutting down.",
strerror(errno),
"Failed getpeername: %s. Dropping the connection, and continuing "
"to listen on %s:%d.",
grpc_core::StrError(errno).c_str(),
listener_addr_uri.ok() ? listener_addr_uri->c_str() : "<unknown>",
socket_.port);
close(fd);
// Shutting down the acceptor. Unref the ref grabbed in
// AsyncConnectionAcceptor::Start().
Unref();
handle_->NotifyOnRead(notify_on_accept_);
return;
}
addr = EventEngine::ResolvedAddress(addr.address(), len);

@ -429,13 +429,13 @@ static void on_read(void* arg, grpc_error_handle err) {
auto listener_addr_uri = grpc_sockaddr_to_uri(&sp->addr);
gpr_log(
GPR_ERROR,
"Failed getpeername: %s. This is a critical failure, the "
"listener on %s:%d is shutting down.",
"Failed getpeername: %s. Dropping the connection, and continuing "
"to listen on %s:%d.",
grpc_core::StrError(errno).c_str(),
listener_addr_uri.ok() ? listener_addr_uri->c_str() : "<unknown>",
sp->port);
close(fd);
goto error;
continue;
}
}

Loading…
Cancel
Save