I have added 2 addition columns for Alert Prioritising Dashboard which is introduced in my previous document “Alert Prioritising Dashboard (SWQL) for Problematic Nodes (Servers)” (https://thwack.solarwinds.com/docs/DOC-176521)
One column is AlertTime, another one is Acknowledge (Ack). The Ack column is click-able. Right click it and open a new windows to View or Acknowledge an alert.
Here are the steps to add the 2 columns:
- Create an alert using "Advanced Alert Manager"
Alert Name: “SAM Alert - Nodes with Issues (SQL)”
Alert Description:
Generate an alert for a node if any condition below is met:
Node is Down
OR
Node is UP but AvgResponseTime > 500
OR
Node is UP but CPU Usage > 95%
OR
Node is UP but Memory Usage > 95%
OR
Node is UP but at least one Volume Usage > 95% (if apply)
OR
Node is UP but at least one Hardware component Status is in Warning or worse condition (if apply)
OR
Node is UP but at least one application monitor Status is Warning or worse condition (if apply)
Trigger Condition: (SQL)
LEFT JOIN
(
SELECT NodeID, MAX(VolumePercentUsed) AS vPercentage
FROM Volumes
WHERE VolumeType = 'Fixed Disk'
GROUP BY NodeID
) AS h ON h.NodeID = Nodes.NodeID
LEFT JOIN
(SELECT c.NodeID AS NodeID,
MAX(CASE
WHEN c.Status=1 THEN 0 -- UP
WHEN c.Status=17 THEN 1 -- - Undefined
WHEN c.Status=0 THEN 2 -- Unknown
WHEN c.Status=3 THEN 4 -- Warning
WHEN c.Status=14 THEN 5 -- Critical
ELSE NULL
END
) AS HWStatusID
FROM APM_HardwareCategoryStatus AS c
JOIN APM_HardwareItem AS e ON e.HardwareCategoryStatusID = c.ID
Group BY c.NodeID
) AS g ON g.NodeID = Nodes.NodeID
LEFT JOIN
(SELECT NodeID,
MAX(CASE
WHEN ApplicationAvailability='Up' THEN 0 -- UP
WHEN ApplicationAvailability='Unmanaged' THEN 1 -- Unmanaged
WHEN ApplicationAvailability='Unknown' THEN 2 -- Unknown
WHEN ApplicationAvailability='Unreachable' THEN 3 -- Unreachable
WHEN ApplicationAvailability='Warning' THEN 4 -- Warning
WHEN ApplicationAvailability='Critical' THEN 5 -- Critical
WHEN ApplicationAvailability='Down' THEN 6 -- Down
ELSE NULL
END
) AS aStatusID
FROM APM_CurrentStatusOfApplication
GROUP BY NodeID
) AS i ON i.NodeID = Nodes.NodeID
WHERE
(
Nodes.Status = 2
OR
(Nodes.Status=1 AND Nodes.AvgResponseTime > 500)
OR
(Nodes.Status=1 AND Nodes.CPULoad > 95)
OR
(Nodes.Status=1 AND Nodes.PercentMemoryUsed > 95)
OR
(Nodes.Status=1 AND h.vPercentage > 95.0 AND h.vPercentage < 100.0)
OR
(Nodes.Status=1 AND g.HWStatusID > 3)
OR
(Nodes.Status=1 AND i.aStatusID > 3)
)
Reset Condition: (SQL)
LEFT JOIN
(
SELECT NodeID, MAX(VolumePercentUsed) AS vPercentage
FROM Volumes
WHERE VolumeType = 'Fixed Disk'
GROUP BY NodeID
) AS h ON h.NodeID = Nodes.NodeID
LEFT JOIN
(SELECT c.NodeID AS NodeID,
MAX(CASE
WHEN c.Status=1 THEN 0 -- UP
WHEN c.Status=17 THEN 1 -- - Undefined
WHEN c.Status=0 THEN 2 -- Unknown
WHEN c.Status=3 THEN 4 -- Warning
WHEN c.Status=14 THEN 5 -- Critical
ELSE NULL
END
) AS HWStatusID
FROM APM_HardwareCategoryStatus AS c
JOIN APM_HardwareItem AS e ON e.HardwareCategoryStatusID = c.ID
Group BY c.NodeID
) AS g ON g.NodeID = Nodes.NodeID
LEFT JOIN
(SELECT NodeID,
MAX(CASE
WHEN ApplicationAvailability='Up' THEN 0 -- UP
WHEN ApplicationAvailability='Unmanaged' THEN 1 -- Unmanaged
WHEN ApplicationAvailability='Unknown' THEN 2 -- Unknown
WHEN ApplicationAvailability='Unreachable' THEN 3 -- Unreachable
WHEN ApplicationAvailability='Warning' THEN 4 -- Warning
WHEN ApplicationAvailability='Critical' THEN 5 -- Critical
WHEN ApplicationAvailability='Down' THEN 6 -- Down
ELSE NULL
END
) AS aStatusID
FROM APM_CurrentStatusOfApplication
GROUP BY NodeID
) AS i ON i.NodeID = Nodes.NodeID
WHERE
(
Nodes.Status=1
AND Nodes.AvgResponseTime < 500
AND Nodes.CPULoad < 95
AND (Nodes.PercentMemoryUsed < 95 OR Nodes.PercentMemoryUsed IS NULL)
AND (h.vPercentage < 95.0 OR h.vPercentage IS NULL)
AND (g.HWStatusID < 3 OR g.HWStatusID IS NULL)
AND (i.aStatusID < 4 OR i.aStatusID is NULL)
)
- Find Alert Definition ID:
If you import the attached alert definition file (SAM_Alert-NodesWithIssuesSQL.AlertDefinition), the AlertDefID is “14d4ba87-7ed3-4b0d-9119-b592b8cfacd8”.
If you created your own alert, using the method below to find out AlertDefID:
· Create a temporary summary view in Solarwinds Web Console, add a “Custom Query” Resource.
· Edit the “Custom Query”, add the SWQL query below:
SELECT AlertDefID, Name
FROM Orion.AlertDefinitions
WHERE Name = 'SAM Alert - Nodes with Issues (SQL)'
· Back to the view, and find out the AlertDefID
- Create Alert Dashboard:
Follow the steps in my previous document (https://thwack.solarwinds.com/docs/DOC-176521) to create the alert page
The new SWQL queries are attached.
If you import my Alert, the queries should work out-of-box. If you creates your own alert, please replace AlertDefID
“14d4ba87-7ed3-4b0d-9119-b592b8cfacd82 with your own one. (2 occurrences in the query)
Thanks