Quantcast
Channel: THWACK: All Content - Network Performance Monitor
Viewing all articles
Browse latest Browse all 21870

Alert Prioritising Dashboard (SWQL) for Problematic Nodes (Servers)

$
0
0

Here is an example to use SWQL to build a view to display problematic nodes (servers) with issues from one or more flowing areas:

 

•    Node Status (column name: CONN) - (1 UP, 2 Down, ignore other status)

•    Node Response Time (column name: M_SECS) - in milliseconds, (> 0 OR When Node is Down, it is -1). If M_SECS> 500: Warning, If  M_SECS> 500: Critical

•    Node CPU Load (column name:  C_LOAD) - in percentage, (Between 0 - 100). If  C_LOAD > 95: Warning, If  C_LOAD > 98: Warning, If  C_LOAD =100: Down

•    Node Memory Usage (column name:  R_Load) - percentage, (Between 0 - 100). If  R_LOAD > 95: Warning, If  R_LOAD > 98: Warning, If  R_LOAD =100: Down

•    Node Highest Volume Usage (column name:  V_PERCENT) - (Between 0 - 100). If  V_PERCENT > 95: Warning, If  V_PERCENT > 98: Warning, If  V_PERCENT =100: Down

•    Node Hardware Components worst  Status (column name: HW_Status) -  (UP, Undefined, Unknown, Warning, Critical, n/a)

•    Node  Application worst Status (column name:  APP_Status)  - (UP, Unmanaged, Unknown, Unreachable, Warning, Critical, Down, n/a)

 

Alert-swql.jpg

 

In order to the worst (highest priority) condition are shown on the top of the list I gave each status different scores, and each column different weights. Then calculate total score as the priority. Here is the calculation:


•    wConn (Connection), scores: Down - 1000, Up - 0; weight 1.00
•    wTime (Response Time), scores:  > 1000ms - 80, >500ms - 10, other - 0; Weight 0.75
•    wCPU (CPU Load), scores: 100% - 600, >98% - 80, >95% - 10, Other - 0; Weight 1.00
•    wRAM (Memory Load), scores: 100% - 600, >98% - 80, >95% - 10, Other 0; Weight 1.00
•    wVol (MAX(Volume Usage)), the highest volume usage of all volumes on a node, scores: 100% - 600, >98% - 80, >95% - 10, Other 0; Weight 0.75
•    wHW  (Hardware Status (worst Value)), the worst HW component status of a node with HW monitor enabled   scores: Critical - 80, Warning - 10, Up - 0, other 1; Weight 0.50
•    wApp (Application Status (worst value), the worst application statues of a node with application monitors assigned.  scores: Down - 600, Critical - 80, Warning - 10, Up - 0, other 1; Weight 0.50

 

Maximum Total Weighted Score (Exclude wConn):  80*0.75 + 600*1.00 + 600*1.00 + 600*0.75 + 80*0.50 + 600 *0.50  = 2050

 

Priority = ROUND((t1.wTime*0.75 + t1.wCPU*1.0 + t1.wRAM*1.0 + t1.wVol*0.75 + t1.wHW*0.5 + t1.wApp*0.5)/2.05 + t1.wConn*1.00, 2)

 

Final Priority value is between 0 and 1000.

 

You can change the score and weight to meeting your requirement.

 

Steps:

  • Create a view; add “Custom Query” resource.

swql.jpg

 

  • In the view, edit Custom Query:
  • In the Custom SWQL Query box, add the codes in attached file “thwack-swql-alerts.txt”
  • Enable search, and in Search SWQL Query box, add the codes in attached file “thwack-swql-alerts-withSearch.txt”

 

Done!

 

Using Search:

 

•    By Node Name
If you want to just display a node or a group of nodes with similar names, type node name or part of the name in the search box and click search button.
•    By Connection Status
If you want to just display nodes in DOWN status, type “n 1” (white space between n and 1) in the search box and click search button.
•    By CPU or RAM or Volume usage
If you want to just display node with CPU or RAM or Volume usage above certain level, using the following:

     o    “c 80”  (CPU usage above 80%)
     o    “r 80” (Memory usage above 80%)
     o    “v 80” (Volume usage above 80%)
•    By Hardware Status
If you want to just display node with certain hardware status, type “h status” (‘status’ can be one of the following: UP, undefined, Unknown, Warning, Critical, n/a).
•    By Application Status
If you want to just display node with certain application status, type “a status” (‘status‘ can be one of the following: UP, Unmanaged, Unknown, Unreachable, Warning, Critical, Down, n/a).

 

You can customise the query to meeting your requirements.

 

Thanks Alex Soul's post https://thwack.solarwinds.com/docs/DOC-174568, which is very helpful!

 

===========================

 

As Alex suggested, I have updated the query and new files are attached. Thanks Alex!


Viewing all articles
Browse latest Browse all 21870

Trending Articles



<script src="https://jsc.adskeeper.com/r/s/rssing.com.1596347.js" async> </script>