Alerting for node reboot where node rebooted between status intervals

Here is what I am trying to do. I need a way to capture nodes that have entered fast poll but have not gone down as per the polling interval. I have nodes that reboot faster than their polling interval as we modify polling intervals based on urgency and impact (ICMP and SNMP polling respectively). If the device is not classified to poll faster than the reboot cycle (think access switch here), then we may miss the node going down as it returns before the node comes out of the fast polling cycle. While a device may miss the first polling cycle and drop into fast poll for 2 minutes, the device may return to service before the fast poll period runs out meaning that the device never shows down. We want to avoid alerting on warning status alone as that will capture every node that misses a single polling interval, not necessarily nodes that actually rebooted.

Our ICMP intervals are 60, 90, 120 and 150 seconds (urgency 1 – 4) and SNMP polling of 5, 10, and 15 minutes (impact 1 – 3). Worst case scenario is that a node:

1) Responds to an ICMP request and then fails

2) Polling engine waits 150 seconds to poll again and misses (maybe)

3) Starts fast polling for 120 seconds (default)

4) Node responds on the last of the fast poll periods for a total of 270 seconds down.

We had been alerting on node reboot, in addition to node down, but since node reboot is based on an SNMP service restart, every time we made a change to the SNMP daemon on a Linux box we got an erroneous node down notification. We noticed the problem on low priority switches that appeared to be rebooting between polling intervals and coming back up to a pingable state faster than the status interval + fast polling period

Any ideas?

I know that we could a) reduce the fast polling period or b) change the urgency of a node, but the later change will also change the nodes priority for escalation is isn't really an option. Just wondering if there are any alert configurations out there that capture for this scenario. Last reboot works, but I have to wait for the node to come back up before I alert -- and that could be a multi-minute delay.

Thanks,

Josh

Alerting for node reboot where node rebooted between status intervals

Trending Articles

Practice Sheet of Right form of verbs for HSC Students

Download: FK ft Shenky – Nakuyewa ”Prod by: Shenky”

How to win at Markstrat (Markstrat Tips and Tricks) – Vodites

Ominde Commission Report and Recommendations – Ominde Report of 1964

Bureau of Internal Revenue: Regional Offices (Directory)

GO 53 on Enhancement of Ex-gratia upto 5 Lakhs Toddy Tappers in Telangana

Cakewalk CA-2A Leveling Amplifier v2.0.1.97 WiN, v2.0.1.96 OSX Incl Keygen

Mp3 Download: Mdu - Kunjenjenjena

How the kill the job , when DTP request running for long hours.

Microsoft Intune から展開しているアプリのアップデートについて

18-year-old girl was beaten for half an hour by two Northampton men in 'an...

Car crash in Dunton Bassett leaves driver in critical condition

Macky 2, Two Others In Road Accident

Application log 00000000000000089514: Could not convert queue DLVST90CLNT

Detroit mafia: D’Anna Brothers agree to plea deal

Delivery block field greyed out using VA02

Muloraki Au

【個人撮影】スマホのプライベート映像♪「中に出さないで///」カラオケ屋での生ハメ撮りが流出ｗ【リベンジポルノ】＠PornHub

BREAKING NEWS: Diamond Platnumz Is Reported Dead After Ghastly Car Accident

FIAT 500 B0111 B0112