Today's event started innocently enough with a poller server requiring a restart. This is not unusual. My inherited environment is missing a lot of care a feeding and the best place to start is in log files.
On the problem poller, I found the following line in the C:\ProgramData\Solarwinds\Collector\Logs\DataProcessor.log file. And there are lot of them.
2014-04-08 14:20:02,335 [29] WARN SolarWinds.NPM.Collector.BL.CustomPollerTransform - Unable to complete transform. Assign poller 'Array_Drive_Size' to this node. CustomPollerAssignmentID=4dc3141a-1c7a-4b77-8a2f-449346aa8b9b
While not overly useful to a human, the CustomPollerAssignmentID is pretty unique and can be found in the database. SWQL Studio, from the SDK, is very useful in finding things in the database. Unfortunately, I had to hunt for the table where the CustomPollerAssignmentID is the primary key. The table naming convention is pretty good and well followed. I found it on the Orion.NPM.CustomPollerAssignment table.
So, in my environment, that table has about 23,000 records. That number seems very high. In two years of administration, I've had to apply a custom poller only twice. I can see now why my log files have so many entries. But, the NodeID isn't very useful either. Some trial and error with the SWQL syntax led me to the following query to pull together the node's Caption (human name). I also want to know which systems have the most of them, to better target my clean up effort.
Now this is useful. Sorry for the blurs, but that will be useful information only to your environment anyway. As you can see, 56 pollers on a single node, and that's 80% down the list. But, the nodeid column gives me another clue. Since they all have the same number of pollers, and coincidentally the exact same pollers, and their IDs are showing a sequence, I believe that someone ran a discovery with the wrong options. Discovery's are not bad themselves, but when applying 56 pollers to each device unconditionally, problems will show up later.
A quick peek at the pollers for one of these nodes shows you what's on there. Go into your nodes list in the web interface, select one and click the Assign Pollers link. The node you select may or may not have pollers assigned, but we're gonna tweak the URL. Add a 'Nodes' clause to the end and the node id of one of your suspects:
http://orion/Orion/NPM/NodeCustomPollers.aspx?ReturnTo=<truncated>&Nodes=2411
You'll get your familiar poller assignment page for the specific node. Here you see what has already been applied and remove them if necessary.
Given that I want to go home today, we'll need a better way to clean up those nodes rather than one by one. Time for more fun with the URL. Just add a list of node IDs. Yes, it works!
http://orion/Orion/NPM/NodeCustomPollers.aspx?ReturnTo=<truncated>&Nodes=2411,2412,2416,2434,2435,2436
Groups of 20 to 30 work well for me. The page will show you all the devices by name and have a smattering of checked pollers. Uncheck them and click the Submit button. You may have to wait a bit for it to complete. After all, it's doing your cleaning for you. Repeat your query in SWQL Studio and watch your list get smaller.
I'm sure there are even more optimizations available, such as using SWQL to delete the records. I hope the community can provide some of those solutions. But if you're seeing errors in your logs, you've got some cleaning to do. This should give you a good example of using SWQL Studio and the web interface to fix up some problems. Oh, and if you have a node that needs pollers for Cisco, Juniper, Riverbed, UPS, Windows, Solaris and PaloAlto, take a picture because I want to see this.