Alerting setup failed us when most needed - suggestions for improvement

Yesterday we had a midnight incident with Cisco 3750 (seems very choosy about the time window), all end hosts connected to the switch lost network connection and the switch was also not manageable. Finally we resumed service by reloading the switch. Solarwinds didn't send out any email alert during the whole outage. It was a failure at various levels for the network management setup.

1. The email alerting stopped working few days earlier and I came to know of it only after the incident. It seemed to be a permission issue, but still inexplicable why those SMTP failure events were not highlighted in the Web console. I spend half my time staring in to that console and I am sure I would have noticed it. And it seems inexplicable that the Alerting engine was working fine for more than a year and suddenly it became too fussy about running as LOCAL SYSTEM account.

2. Two hours before the loss of network connection, the switch stopped responding to SNMP polls. This we found out using missing CPU load data from historic charts. However the device was still pingable and hence no node down event was recorded (Up until the manual reload). To me, it seems the switch was already showing signs of outage when it stopped responding to SNMP. Could we have noticed it earlier? Is it possible to send out email alerts if a SNMP poll fails?

The greatest embarrassment for a monitoring system is when user reports that the network is down and everyone looks surprised. Even more so when I constantly have to convince my team to tolerate several false alerts just to not miss any events, but the system failed to throw any alert during a real incident.

Any suggestions for improvement?

Alerting setup failed us when most needed - suggestions for improvement

Trending Articles

Tumkur University Results 2017 May-June Semester 1st-2nd-3rd-4th-5th-6th

VARRIO KING KOBRAS RIFA

Forum Post: RE: TSW14J58EVM: TSW12QJ1600EVM interfacing with TSW14J58EVM...

Black Angus Grilled Artichokes

Practice Sheet of Right form of verbs for HSC Students

A/L Technology Stream – Subject combinations, Syllabuses and Teacher guides

Connect failed:(10060) The current connection has timeout.

41 Funny Acronyms of SAP

Notes of Development of Phy. Edu. - Post Independence| Class 11th Physical...

Late educator remembered at graduation

Bureau of Internal Revenue: Regional Offices (Directory)

Windows Update / Microsoft Update の接続先 URL について

Game Of Thrones S03 Season 3 720p BluRay DTS x264-PublicHD

Moondru Mudichu 20-07-2016 – Polimer tv Serial

VHSE First (1st) Allotment 2025 - vhscap.kerala.gov.in

BALDEO BIDAISEE

Best Suvichar in Hindi |बेस्ट सुविचार |शुभ विचार हिंदी में

Rajasthan Board 10th Result 2016 Roll No wise & Name Wise

The Angry Birds Movie (Tamil Dubbed)

Lady Gaga – MAYHEM (Bonus Tracks Version) [iTunes Rip M4A]