As we setup alerting for delegates.
Here is our standard way of checking and alerting on delegates.
The verify delegate connection runs every 5 minutes and AlertType DelegatesDown has pendingCount = 2.
For example, let’s say Delegate goes down at 0 mins.
The Alert check job runs after 5 mins – Alert is created with Pending state and trigger count = 0
The Trigger count is incremented in db to 1 (no notification sent).
Job runs again after 5 mins – trigger count = 1
Existing alert’s trigger count is updated to db in 2
Job runs again after 5 mins – now triggerCount = 2 which is equal to pendingCount
Now Notification is sent UI and via mail as alert notification is only sent when trigger count >= pendingCount of alert type.
As for the alert details, we show the time the alert was created, not since when the delegate is down.
For example, let’s say it says 9 hours it means the alert was created 9 hours ago.
Additionally, Alerts have TTL of 14 days. This means that the original alert will be removed, then since the alerting condition is still met, the system will create another alert for it.
How long the delegate is down is reflected in the delegate card in the field “Last heartbeat”.