What do you do when you have nearly 11,000 nodes and 70,000 elements and someone ask you "How are you sure that you are monitoring all of the appropriate interfaces and volumes on those nodes?"
We decided to roll out a scheduled discovery of our existing nodes. (Say that slowly to yourself and you'll realize how odd that sounds.)
We built a bunch of scheduled discoveries that were tied to the network security zones so that we could specify the appropriate polling engine to use in the query and then grouping the queries by WMI credentials. Why group by WMI credentials? We didn't want to pound on a box with the wrong creds once a week and scare the security teams. Plus it makes the discoveries faster
A word of warning -- when displaying your discovery results you are limited to 1000 nodes. Make sure that your groups have 1000 nodes or less or you risk having some re-work on your hands.
We populated our discoveries using the results of some simple SQL queries. We matched the queries (which we run manually right now but we could do via a report just as easily) with the schedule names so that we can manually update the list of IP addresses in each discovery on a schedule basis. Another caveat here: When we modify a discovery all of the previous discovery results are deleted until the discovery is run again. Make sure you've taken any appropriate actions (added or ignored nodes and/or elements) before you refresh your list of target IP addresses.
With our groups built we let the discoveries run and then started to analyze the results. Again, a few caveats:
1) When importing results you can micro-manage the interfaces but you cannot micro-manage the volumes that are added.
2) Even if you specify all of the appropriate interfaces and volumes, if the node displays an AppInsight element it will be added automagically. You can remove these via APM if you don't want to deploy them. (And we don't -- at least not right now.)
3) If you want to ignore a node you can do that but it makes more sense to ignore certain resources. This process is time consuming. There is no easy way (that we know of) to select an individual element on a node and ignore it. If you select the entire node (left-most check box on the Scheduled Discovery Results view) then you will ignore the entire volume and prevent future elements from being found. This defeats the purpose of this process.
After trying to streamline this process a couple of times we had a crazy idea. What if, instead of using the UI to parse out the elements we wanted to import, we did a direct edit of the DB and removed all of the elements that we didn't want to see in the discovery? The DB is nicely organized so that we can edit each individual element type in a single table:
DiscoveredNodes = List of nodes (which should equal be the nodes we seeded from our SQL queries above)
DiscoveredInterfaces = List of discovered interfaces (this would allow us to quickly parse out those miniport adapters from our Windows boxes)
DiscoveredVolumes = List of discovered volumes (huge win here as we could define our search patterns by the lists our Windows and *Nix teams provided)
The key to unlocking all of this goodness is the DiscoveryProfiles table which contains the list of all of our scheduled discoveries. The DiscoveredObjectID is unique only within the ProfileID that discovered it. You can check the DiscoveredNetObjectStatuses table for a mapping of ProfileID to DiscoveredObjectID to ManagedNetObjectID where ManagedNetObjectID = Nodes.NodeID.
So -- here are the questions >> Is this idea too crazy to implement? Has anyone else ever done something similar? Is there an easier way to discover the volumes and interfaces that may have been added by the platform teams but not forwarded to the monitoring team for updating?