Quantcast
Channel: THWACK: All Content - Network Performance Monitor
Viewing all articles
Browse latest Browse all 21870

NPM: Mass Outage Alerting- Over 50 nodes down, Send Alert

$
0
0

Hello,

 

I have created an alert that will notify the necessary team members if there are over 50 nodes down, but I have ran into some issues. I have already opened a support case and they recommended that I reach out here. The general flow of this alert is that if there are 50 nodes down at a given time the alert will trigger, and then it will reset when there are less than 50 stores down. This alert is powered by the following SQL Queries: (also, this frequency on this alert is every 5 minutes)

 

The Trigger Query is as follows:

(I had to use a TOP 1 in the SELECT Statement because if I did not I would receive an alert email for every node down, and we want a single email stating there are over 50 stores down.)

 

SELECT TOP 1 Nodes.NodeID AS NetObjectID, Nodes.Caption AS Name

FROM Nodes

WHERE StatusDescription LIKE 'Node Status is Down.'

GROUP BY NodeID, Caption

HAVING (SELECT COUNT(*)

FROM Nodes

WHERE StatusDescription LIKE 'Node Status is Down.')>=50

 

The Reset Query:

(I tried modifying the reset query to include a TOP 1, but when I change the reset query it automatically removes the TOP 1 in the database.)

 

SELECT {TOP 1} Nodes.NodeID AS NetObjectID, Nodes.Caption AS Name

FROM Nodes

WHERE StatusDescription LIKE 'Node Status is Down.'

GROUP BY NodeID, Caption

HAVING (SELECT COUNT(*)

FROM Nodes

WHERE StatusDescription LIKE 'Node Status is Down.')<50

 

I have tested many different options, and this seems to be the best solution. The trigger action is executing successfully whenever there are more than 50 stores down, and if there are less than 50 stores down the next time it runs the query (every 5 mins), the reset action is working correctly. However, if the trigger action is executed and the next time the query runs there are still over 50 stores down it will send another trigger action and then we will never receive the reset email. I believe this issue occurs if the TOP 1 device changes and there are still over 50 stores down. Any help is appreciated!

 

The next suggestion from Solarwinds support is to submit a feature request where the Node Groups will contain a value of the total number of stores down within that group. Any thoughts on this feature request?

 

Thanks,

Troy


Viewing all articles
Browse latest Browse all 21870

Trending Articles



<script src="https://jsc.adskeeper.com/r/s/rssing.com.1596347.js" async> </script>