Currently health check failure events are only log if the HealthFlag for a host transition from non-FAILED_ACTIVE_HC to FAILED_ACTIVE_HC. However, since hosts are initialized in the FAILED_ACTIVE_HC state, hosts that never became healthy have no events associated with it.
Since the current health check events only log transitions, we'll have to scan the entire log in order to find the hosts in a current failing state. Then we'll still have to filter the hosts permanently removed from the cluster by the discovery service. This makes the events very difficult to use in operations.
Proposed solution
Both of these 2 issues can be solved by emitting a health check failure event if either of these conditions are true:
If the active health check failed and it's the first health check for a host. This ensures we have events for hosts that never became healthy.
If the active health check failed and a AlwaysLogFailures configuration is set to true, by default this flag is set to false. This makes it very easy to find the hosts currently failing by looking at the last few seconds of logs.
Signed-off-by: Henry Yang <hyang@lyft.com>
Mirrored from https://github.com/envoyproxy/envoy @ 11e196b67ee9124f33c45f5adf542841386e3c39
*Description*: PGV picks up unused imports in `api/envoy/data/core/v2alpha/health_check_event.proto`.
Error message is:
```
INFO: From ProtoGenValidateCcGenerate external/envoy_api/envoy/data/core/v2alpha/health_check_event.pb.validate.h:
envoy/data/core/v2alpha/health_check_event.proto: warning: Import envoy/api/v2/core/base.proto but not used.
envoy/data/core/v2alpha/health_check_event.proto: warning: Import google/protobuf/wrappers.proto but not used.
envoy/data/core/v2alpha/health_check_event.proto: warning: Import google/protobuf/duration.proto but not used.
```
*Risk Level*: Low
*Testing*: `bazel test //test/...` and running on local instances
*Docs Changes*: none required
*Release Notes*: none required
Signed-off-by: Michael Payne <michael@sooper.org>
Mirrored from https://github.com/envoyproxy/envoy @ c951e6088a5e1214c864448b0ccfd104bf2131ee
This PR adds timestamp field to the HealthCheckEvent message to allow
having it rendered inside the JSON serialized log of a health check
event.
Signed-off-by: Dhi Aurrahman <dio@tetrate.io>
Mirrored from https://github.com/envoyproxy/envoy @ 8d0680feb074999c18998930b7f5f261f8f4a7a0