Introduction
A long journey ended when the proper syntax was found (Thank you, Thwack community) to correlate certain SNMP traps received with other alert values. Here is a short guide on how to use traps in alerts within the GUI of SolarWinds NPM.
In this example, I am receiving a "dying gasp" in SNMP from an Alcatel-Lucent (Now Nokia) 7210SASD. When such an event happens, the equipment is basically telling me it lost power. This allows me to separate losing nodes from network failures or power failures. In other words, I only take action if the node is down due to the network. There isn't much I can do about power in those remote locations or customer premises.
Using Node Custom Properties
It all starts with a custom property on the nodes, which I called LossOfPower. (Boolean) See the attached picture for more details.
SNMP Traps
The traps have to be sent to SolarWinds. Here is the code for the 7210.
snmp-trap-group 1
description "SolarWinds 1"
trap-target "solarwinds1" address <Solarwind NPM Server IP> snmpv2c notify-community "CatchyNameHere"
exit
snmp-trap-group 98
description "OtherSNMPServers"
trap-target "Server1" address <Server1 IP> snmpv2c notify-community "snmpv2cSAMtrap98"
trap-target "Server2" address <Server2 IP> snmpv2c notify-community "snmpv2cSAMtrap98"
exit
snmp-dying-gasp primary 1 "solarwinds1" secondary 98 "Server1" tertiary 98 "Server2"
The next step is to create the new alert which will set this property. This was written in SQL, not SWQL.
Trigger
SELECT Nodes.NodeID, Nodes.Caption FROM Nodes
INNER JOIN Traps
ON Nodes.NodeID = Traps.NodeID
AND Traps.DateTime > DATEADD(MINUTE, -6, SYSDATETIME())
AND Traps.TrapType = 'TIMETRA-SAS-SYSTEM-MIB:tmnxDyingGasp ';
The two tables intersect using the INNER JOIN command, based ON the NodeID. There is a timer on this and only the DyingGasp received in the last 6 minutes is considered.
Reset
SELECT Nodes.NodeID, Nodes.Caption FROM Nodes
INNER JOIN Traps
ON Nodes.NodeID = Traps.NodeID
AND Traps.DateTime < DATEADD(MINUTE, -9, SYSDATETIME())
AND Traps.TrapType = 'TIMETRA-SAS-SYSTEM-MIB:tmnxDyingGasp '
AND Nodes.Status = 1;
If it has been more than 9 minutes and if the node is back online, this alert is reset.
Trigger Action
It simply sets the LossOfPower variable to "YES".
Reset Action
Set the LossOfPower variable to "No".
Usage
This is modular. The LossOfPower variable is used in another much simpler alert (it could be several other alert contexts) where we get contacted when a node is down. If the node is down due to LossOfPower, we do nothing. If it is otherwise down due to other causes, we take action.
Testing and Researching
To get all the properties from a table, SolarWinds NPM includes a query test page. Note the database names are slightly different. It is located at http://<yourserverIP>/Orion/Admin/swis.aspx
If Orion.Traps is selected as a source, the Generate Select Query button returns this:
SELECT Acknowledged, ColorCode, Community, DateTime, Description, DisplayName, EngineID, Hostname, InstanceType, IPAddress, NodeID, ObservationRowVersion, ObservationSeverity, ObservationSeverityName, ObservationTimestamp, Tag, TimeStamp, TrapID, TrapType, Uri FROM Orion.Traps
This is useful in finding new fields you might need in your particular case.
It is possible to remove certain fields from the SELECT and see what is returned. This won't work with traps though, as the table can get quite lengthy. This particular table is a log file of all traps. Try it on Orion.Nodes instead.
SELECT AgentPort, Allow64BitCounters, AncestorDetailsUrls, AncestorDisplayNames, AvgResponseTime, BlockUntil, BufferBgMissThisHour, BufferBgMissToday, BufferHgMissThisHour, BufferHgMissToday, BufferLgMissThisHour, BufferLgMissToday, BufferMdMissThisHour, BufferMdMissToday, BufferNoMemThisHour, BufferNoMemToday, BufferSmMissThisHour, BufferSmMissToday, Caption, ChildStatus, CMTS, Community, Contact, CPULoad, CustomPollerLastStatisticsPoll, CustomPollerLastStatisticsPollSuccess, CustomStatus, Description, DetailsUrl, DisplayName, DNS, DynamicIP, EngineID, EntityType, External, GroupStatus, Icon, Image, InstanceType, IOSImage, IOSVersion, IP, IP_Address, IPAddress, IPAddressGUID, IPAddressType, IsServer, LastBoot, LastSync, LastSystemUpTimePollUtc, Location, MachineType, MaxResponseTime, MemoryAvailable, MemoryUsed, MinResponseTime, MinutesSinceLastSync, NextPoll, NextRediscovery, NodeDescription, NodeID, NodeName, ObjectSubType, OrionIdColumn, OrionIdPrefix, PercentLoss, PercentMemoryAvailable, PercentMemoryUsed, PollInterval, RediscoveryInterval, ResponseTime, RWCommunity, Severity, SkippedPollingCycles, SNMPVersion, StatCollection, Status, StatusDescription, StatusIcon, StatusIconHint, StatusLED, SysName, SysObjectID, SystemUpTime, TotalMemory, UiSeverity, UnManaged, UnManageFrom, UnManageUntil, Uri, Vendor, VendorIcon FROM Orion.Nodes
Using the SWIS Query test page will be the subject of another entry.
Regards,