Here is an example to use SWQL to build a view to display problematic nodes (servers) with issues from one or more flowing areas:
• Node Status (column name: CONN) - (1 UP, 2 Down, ignore other status)
• Node Response Time (column name: M_SECS) - in milliseconds, (> 0 OR When Node is Down, it is -1). If M_SECS> 500: Warning, If M_SECS> 500: Critical
• Node CPU Load (column name: C_LOAD) - in percentage, (Between 0 - 100). If C_LOAD > 95: Warning, If C_LOAD > 98: Warning, If C_LOAD =100: Down
• Node Memory Usage (column name: R_Load) - percentage, (Between 0 - 100). If R_LOAD > 95: Warning, If R_LOAD > 98: Warning, If R_LOAD =100: Down
• Node Highest Volume Usage (column name: V_PERCENT) - (Between 0 - 100). If V_PERCENT > 95: Warning, If V_PERCENT > 98: Warning, If V_PERCENT =100: Down
• Node Hardware Components worst Status (column name: HW_Status) - (UP, Undefined, Unknown, Warning, Critical, n/a)
• Node Application worst Status (column name: APP_Status) - (UP, Unmanaged, Unknown, Unreachable, Warning, Critical, Down, n/a)
In order to the worst (highest priority) condition are shown on the top of the list I gave each status different scores, and each column different weights. Then calculate total score as the priority. Here is the calculation:
• wConn (Connection), scores: Down - 1000, Up - 0; weight 1.00
• wTime (Response Time), scores: > 1000ms - 80, >500ms - 10, other - 0; Weight 0.75
• wCPU (CPU Load), scores: 100% - 600, >98% - 80, >95% - 10, Other - 0; Weight 1.00
• wRAM (Memory Load), scores: 100% - 600, >98% - 80, >95% - 10, Other 0; Weight 1.00
• wVol (MAX(Volume Usage)), the highest volume usage of all volumes on a node, scores: 100% - 600, >98% - 80, >95% - 10, Other 0; Weight 0.75
• wHW (Hardware Status (worst Value)), the worst HW component status of a node with HW monitor enabled scores: Critical - 80, Warning - 10, Up - 0, other 1; Weight 0.50
• wApp (Application Status (worst value), the worst application statues of a node with application monitors assigned. scores: Down - 600, Critical - 80, Warning - 10, Up - 0, other 1; Weight 0.50
Maximum Total Weighted Score (Exclude wConn): 80*0.75 + 600*1.00 + 600*1.00 + 600*0.75 + 80*0.50 + 600 *0.50 = 2050
Priority = ROUND((t1.wTime*0.75 + t1.wCPU*1.0 + t1.wRAM*1.0 + t1.wVol*0.75 + t1.wHW*0.5 + t1.wApp*0.5)/2.05 + t1.wConn*1.00, 2)
Final Priority value is between 0 and 1000.
You can change the score and weight to meeting your requirement.
Steps:
- Create a view; add “Custom Query” resource.
- In the view, edit Custom Query:
- In the Custom SWQL Query box, add the codes in attached file “thwack-swql-alerts.txt”
- Enable search, and in Search SWQL Query box, add the codes in attached file “thwack-swql-alerts-withSearch.txt”
Done!
Using Search:
• By Node Name
If you want to just display a node or a group of nodes with similar names, type node name or part of the name in the search box and click search button.
• By Connection Status
If you want to just display nodes in DOWN status, type “n 1” (white space between n and 1) in the search box and click search button.
• By CPU or RAM or Volume usage
If you want to just display node with CPU or RAM or Volume usage above certain level, using the following:
o “c 80” (CPU usage above 80%)
o “r 80” (Memory usage above 80%)
o “v 80” (Volume usage above 80%)
• By Hardware Status
If you want to just display node with certain hardware status, type “h status” (‘status’ can be one of the following: UP, undefined, Unknown, Warning, Critical, n/a).
• By Application Status
If you want to just display node with certain application status, type “a status” (‘status‘ can be one of the following: UP, Unmanaged, Unknown, Unreachable, Warning, Critical, Down, n/a).
You can customise the query to meeting your requirements.
Thanks Alex Soul's post https://thwack.solarwinds.com/docs/DOC-174568, which is very helpful!
===========================
Update: As Alex suggested, I have updated the query and new files are attached. Thanks Alex!
===========================
Update: 11/March/2015
I have added 2 addition columns for Alert Prioritising Dashboard.
One column is AlertTime, another one is Acknowledge (Ack). The Ack column is click-able. Right click it and open a new windows to View or Acknowledge an alert.
Please see the additional document at https://thwack.solarwinds.com/docs/DOC-176727
============================
Update: 11/11/2015
The original query is for NPM & SAM, but if you only need NPM (network nodes) part, I did create another two queries for network devices only.
The files: "networkNOC-ForThwack.txt" and "InterfaceNOC-ForThwack.txt" are attached.
"networkNOC-ForThwack.txt" is for network device (NPM) only.
"InterfaceNOC-ForThwack.txt" if is for network Interface only.
Both are limited to Vendor = 'Cisco', you can change it to meet your requirements.