Alert Reset Conditions After NPM 12 Upgrade

June 16, 2016, 6:02 am

≫ Next: Network Performance Monitor v12.0 Upgrade Training - Video.pdf

Prior to NPM 12 upgrade, I set all my node down alerts to reset after the condition is no longer true. The trigger condition would be node equals down. I would then send a reset email that would have the state variable in the subject. It used to be that the reset would reset on Up, but now it's resetting with condition Warning, which isn't actually true (a continuous ping from solarwinds shows it is never responding to ping and is actually down). Then after the next check of the alert, it goes back to down. I haven't changed anything with my alerts between upgrades - is this a change in behavior of the alerts engine? Any thoughts on how to rewrite the alert if this is expected behavior now?

Screenshots of current example alert:

Do I need to change the reset condition to custom and reset when status is equal to up?

↧

Network Performance Monitor v12.0 Upgrade Training - Video.pdf

July 27, 2016, 4:29 am

≫ Next: Netpath not establishing any connections

≪ Previous: Alert Reset Conditions After NPM 12 Upgrade

↧

Netpath not establishing any connections

November 3, 2016, 1:07 pm

≫ Next: Worldwide Map - Location Field Issues

≪ Previous: Network Performance Monitor v12.0 Upgrade Training - Video.pdf

Within the last 2 weeks, all but one of our Netpaths have stopped displaying any data, just displaying the "No Data Found" message.

I deleted all of them and tried making new ones, but none of them are getting any data. I have them going from the main Solarwinds server as well as from remote agents.

It's even more strange that once path would keep working while all others stopped, and no new ones will establish.

I have done the recommended database cleanup, but no improvement.

Case #1073734 has been opened, but have not received a response yet.

Anyone else have a similar experience?

↧

Worldwide Map - Location Field Issues

September 15, 2016, 8:57 am

≫ Next: When you installed NPM, did you add Nodes manually or did you run discovery?

≪ Previous: Netpath not establishing any connections

Hello,

I've already got a FR raised around this where the AD Sites and Services name is used for the location when using WMI (which isn't in OpenStreetMap format) but was hoping someone else may have a clever way to get round the following issues.

Issues

The location field sometimes has character limits

The WMI uses the AD site and services as its 'location'.

Possible Fixes

Change the field used by the WorldWideMap (then use Alerts & Actions & Custom Properties to set a value)

Create a mapping table for Addresses to site codes

Anyone have any other suggestions?

Thanks,

Pete

↧

When you installed NPM, did you add Nodes manually or did you run discovery?

July 20, 2016, 12:33 am

≫ Next: Layer 2 and Layer 3 Connection/Device Correlation

≪ Previous: Worldwide Map - Location Field Issues

We would like to improve user experience and for such reason I'd like to better understand if our users prefers INITIALLY to add nodes manually or run product network discovery in order to import devices to NPM

↧

Layer 2 and Layer 3 Connection/Device Correlation

November 9, 2016, 11:27 am

≫ Next: Calling All Arista Geeks!

≪ Previous: When you installed NPM, did you add Nodes manually or did you run discovery?

We are trying to identify which devices are interconnected to each other. When looking at the various tables in the database I see the following tables

NodeL2Connections

NodeL3Entries

When querying these tables it does not appear that correlation of device connections is possible. The end goal is pretty simple, a query that is able to provide a unique nodes list showing the up and or downstream connections.

↧

Calling All Arista Geeks!

October 27, 2016, 12:26 pm

≫ Next: Hardware Details in Report

≪ Previous: Layer 2 and Layer 3 Connection/Device Correlation

We're working on some device support improvements around Arista. To ensure broad coverage, we'd like to get as many SNMP walks as possible to verify functionality against. If you have Arista gear that is missing hardware health info today, please shoot me an SNMP walk! Hardware health includes power supply health and voltage, fan status and speed, and temperature.

Our very own SNMP walk tool outputs in a format that can be used by tooling we've built to automatically test compatibility. For that reason, I'd ask you use that tool specifically to take the walk. Instructions here. You can upload here.

Thanks!

↧

Hardware Details in Report

November 4, 2016, 5:19 am

≫ Next: Alert - Responding to SNMP?

≪ Previous: Calling All Arista Geeks!

Hi All,

I would like to create a report where we can get the below highlighted (Hardware) details against the node.

i used the below query to find the all hardware column in all DB table but unfortunately, i did not the same info in all table,

SELECT COLUMN_NAME, TABLE_NAME

FROM INFORMATION_SCHEMA.COLUMNS

WHERE COLUMN_NAME LIKE '%Hardware%'

Could you please anyone can help use so that we can make a good report, which helps we can understand how many node are physical and virtual and where same is hosted. because this tap is showing also info like, VM is hosted in which esxi or VM is hosted in which hyper-v node.

Thanks

↧

Alert - Responding to SNMP?

March 7, 2006, 7:04 am

≫ Next: Rabbitmq

≪ Previous: Hardware Details in Report

I would like to create an Alert when a device stops responding to SNMP. I do not see this as a choice when creating an Alert. I can however create a view in SYS Mgr with this field?
Can anyone help me to determine how to create this alert?

↧

Rabbitmq

November 10, 2016, 12:15 pm

≫ Next: Newbie - Web Console wouldn't start after 12.0.1 install - FIXED

≪ Previous: Alert - Responding to SNMP?

Hello All,

Just wanted to make sure if installing rabbitmq on my severs is a must for NPM12 upgrade?

Thanks,

Malcolm.

↧

Newbie - Web Console wouldn't start after 12.0.1 install - FIXED

November 10, 2016, 12:29 pm

≫ Next: Can we alert for SNMP not responding nodes?

≪ Previous: Rabbitmq

After installing a new system of v12.0.1 NPM on new servers, the web console would not come up, it gave "page could not be found" errors. Google searches turned up sort-of solutions with rather complicated possibilities.

Here is the solution that was very simple:

1. Start the MS IIS server on your Orion server. You'll get an item in the list under Connections with the name of your NPM server. Expand the listing and you'll see

2. Right click on the db name of the db you are using in SQL. In this case, it's NetPerfMon. Then choose the option "Edit Bindings".

3. In the Site Bindings window, you'll probably see only the line for the port 80 http binding, but it may not. However, it will have some kind of mistake in it that is preventing the web console from

appearing.

4. Add a line similar to what I have in the first binding. In my case I wanted to ONLY have https access on the normal port 443. That's why you see only a single line above. I created it then deleted the mistaken line which was causing the problem.

5. Add the site binding you want. If you want only http access on the usual port 80, enter your server name then click OK.

I left the IP address "All Unassigned" because I did not yet have an SSL certificate already installed for this website (from a CA). The name under SSL certificate is filled in by default and will cause the usual security error when you use your URL to access the Web Console saying the certificate is suspect. I clicked on EDIT to get the screen below to show this certificate issue. If you do later get a cert, you can add it here if you want to do https: access to the Web Console.

6. Now delete the problem binding from the list of bindings. You'll only see your binding(s) you want to use.

7. Get a CMD window on your server and type the command "iisreset" command to restart the IIS service so it can read in the new binding info you added.

You should be able now to start your Web Console, a rather important website to have functioning...

↧

Can we alert for SNMP not responding nodes?

April 8, 2016, 6:32 am

≫ Next: High Availability and Disaster Recovery Solution with Full Servers and Site Protection - RFC/Design Stage

≪ Previous: Newbie - Web Console wouldn't start after 12.0.1 install - FIXED

Hi all,

I searched about this topic but all are very old so wanna know if we can alert for nodes which are not responding to SNMP.

I have only 5-6 nodes for which alerting is required.

So can anyone help me on this?

We actually had an issue for some devices where they were found in hung state and they had reboot the device to bring to normal. So according to me setting up non responsiveness to SNMP will atleast meet this requirement.

↧

High Availability and Disaster Recovery Solution with Full Servers and Site Protection - RFC/Design Stage

October 25, 2016, 5:46 am

≫ Next: NCM/NPM alert schedule based on custom table?

≪ Previous: Can we alert for SNMP not responding nodes?

Below approach is in design stage. In theory everything looks cool and I will keep you posted what's going to happen in practise, soon
Your comments, concerns, feedback, suggestions are highly welcome

Hi All,

Since the release of NPM 12.0.1 we have an exciting new feature - High Availability - in Orion platform. The only minor downside is that it does require HA cluster to be sitting in the same subnet. Oops... For some minor, for other major. What do you do when you have two sites and you cannot stretch subnet easily due to architectural constraints?

Well, I have been digging this for the past several weeks. FoE is not being sold anymore. It may well be supported, but you certainly cannot buy it if you are a new customer. I have started FoE thread here, which has great insights, although now it is not relevant anymore. At the moment it seems like there is a gap for smooth inter-site fail-over solution and I hope this will be plugged soon. As of now SolarWinds offers Active-Active approach (you can find PDF about it in FoE thread as well or just get it here directly).

At first I though that maintaining two instances manually will be a huge pain and massive overhead for engineers and this was stopping me from accepting this idea. However, thinking further and after consulting with SolarWinds support I have realised that we do not need to. Here is what I came up with:

(1)

First - you do need to purchase another additional set of licenses that you have got already (this is the most painful step). To soften this a bit - contact your SolarWinds re-seller partner (or SolarWinds directly) and ask for 50% discount. This is known as "Disaster Recovery License" and is being offered upon request

(2)

Deploy live environment as normal, add all nodes, configure settings, alerts, etc - as usual practise, nothing fancy here

(3)

Deploy exact copy of your live environment at the DR site, use Disaster Recovery License and point it to "empty" database. By "empty" I mean that you do not need to populate it with any assets, just install fresh deployment and leave it as it. You don't even need to configure any settings, views, etc - just vanilla setup (cold standby so to speak)

Now, you will have an Active-Active setup (although not quite Active-Active as initially suggested by SolarWinds in the above PDF. I would rather call it Active-Reserved, because at the DR site you do not add any devices and you do not configure it)

DR considerations:

SolarWinds kit at the DR site will just sit there and do nothing until DR is invoked. We have Database AG in place (mirror formally). So, in the event of main site failure SQL will fail over to mirror copy at DR site. Because we already have SolarWinds deployed at DR - all what will be left to do is to run configuration wizard and re-point DR deployment from empty db to mirror copy of live db at the DR site. Recovery process should last as long as it takes for the configuration wizard to complete its magic (haven't measured it yet, but it will depend on number of module, hardware kit, etc. I anticipate under 2 hour). Well, not instant fail-over as with HA - but good enough, providing that monitoring by itself is not business critical tool and in the event of site fail-over monitoring definitely is not on the list of priorities for the business to worry about, unless you are managed monitoring solutions provider (which I hope you are not, as otherwise 2 hours recovery may be a "killer").

If you do not have SQL AG, then simply ensure you backup your SQL at live site and transfer your backup over to DR site on a regular bases. No need to recover, but fresh backup should be available in the event of DR to restore DB at the DR site

To "complicate" things further (or I would better say to safeguard and increase availability) - we also plan to create HA cluster for application layer (new feature) at both sites, therefore protecting from local app server failures. Although HA at DR might be a bit excessive - we would like to keep things as closely mirrored as possible between DR and LIVE sites

And, yet another thing - NTA server. You can have it on same APP box (which is not recommended, although fully supported), or you can have a separate box. In the event of DR you simply recover NTA DB from backup at DR site and then you should be able to switch to DR box as well

Grey areas:

After running config wizard and re-pointing DR instance at the live db it is not clear how do we proceed with NTA recovery? Particularly, what steps involved in recovery of NTA box. So, this needs further testing, but I guess standard recover approach will be the case here
Running two databases for different SolarWinds deployments within one SQL instance (see image below). After reading a lot of manuals I could not find any reasons for not being able to do so. Your comments are highly appreciated here.
Restoring [App DR] <--> [HA DR] relationship after failover (after running config wizad at DR site and re-pointing to live db). Not sure what is going to happen with HA cluster at this point - again, needs testing. We have asked SolarWinds support to confirm - still waiting for them to come back

Final say:

Once again, after digging through many different options - this one seems the most appealing, with virtually no overhead for Engineers on a day-to-day running - which is key to not overload them. Running Active-Active and managing all changes manually at both ends is way too much - sorry, no, sorry

Diagrams:

Normal operation

Site failover:

↧

NCM/NPM alert schedule based on custom table?

November 2, 2016, 1:00 pm

≫ Next: What are the server specs?

≪ Previous: High Availability and Disaster Recovery Solution with Full Servers and Site Protection - RFC/Design Stage

Our company owns several facilities that host events. During these times we turn everything on and watch it, and "when the carnival leaves town" we shut it all down. I know I can just click these things and unmanage them until the next event, but it's inelegant. I'd like something to handle this automatically.

I was thinking I could add a table to the database with 3 columns, something like:

SITE | Event start | Event End

abc | Dec 1 | Dec 8

abc | Dec 12 | Dec 17

bcd | Nov 7 | Nov 15

cde | Mar 1 | Mar 8

Any site could have any number of events, I just need to query for the site in question and see if any events are "ON" at the moment and if so I need to alert the designated devices, and if there is nothing going on I need to ignore or unmanage these devices.

I've played with custom fields and we use them quite a bit but they don't lend themselves well to a multi-event type of setup. I'd like to just put the whole schedule in one spot for the year and have the system handle everything else automatically.

Any suggestions?

↧

What are the server specs?

November 10, 2016, 1:29 pm

≫ Next: Reporting on a UnDP

≪ Previous: NCM/NPM alert schedule based on custom table?

What are the server specs for 2 physical servers with high data traffic and an sql database with a vm secondary poller for the npm to include mem Windows server version etc...? 12.x

↧

Reporting on a UnDP

November 10, 2016, 2:26 pm

≫ Next: What We're Working on for NPM (Updated October 25th, 2016)

≪ Previous: What are the server specs?

Hi Folks,

I'm hoping you're all smarter than me because I'm hitting nothing but dead ends.

I created a new UnDP today to poll my Cisco devices for their last reload reason. This UnDP is sitting in the default category and is titled "whyReload".

I wanted to create a report of any units that have a reload reason != power on, reload command, etc. To check for nodes that are crashing.

I tried editing an old Report Writer report that shows the last boot time of units but I am not able to find my newly created UnDP in the Report Writer interface.

I've attempted to create a new report in the Web GUI Reporting system, but again I'm unable to find my UnDP in there.

I feeel like I'm missing something super simple but for the life of me, I can't figure out what's wrong. I've followed multiple other posts that cover the same issue but they generally point me to Report Writer and when I go to select the field but Custom Pollers isn't an option for me to click.

Thanks!

Brian

↧

What We're Working on for NPM (Updated October 25th, 2016)

March 30, 2015, 1:55 pm

≫ Next: Anyone else having issues getting NPM to map out the topology correctly?

≪ Previous: Reporting on a UnDP

Since the release on NPM 12.0 we've been hard at working building the next round of exciting functionality and improvements in existing functionality. I'm pleased to share the following list of items we're working on:

Improved Meraki Wireless Support - Full support for Meraki wireless access points
Silence Alerts - Silence Alerts While Still Monitoring
Network Insight for Cisco ASA - Covering things like VPN tunnel monitoring, ACL visibility, and improved platform health (H/A status, hardware health).
Faster and web based simplified product upgrades and installation of hot fixes
Integrated Disaster Recovery Engine
Windows Authentication for SQL DB - Orion can optionally use AD user credentials to authenticate to the SQL database instead of SQL local credentials.
Performance Analyzer - Pull metrics, statuses, events and other data sets into a single view with a shared timeline for correlation and faster troubleshooting.
Import nodes from file with custom properties

↧

Anyone else having issues getting NPM to map out the topology correctly?

October 26, 2016, 7:37 am

≫ Next: Best MIBs to use for DMVPN tunnel status

≪ Previous: What We're Working on for NPM (Updated October 25th, 2016)

What is the minimum requirements for a node to be able to discover populate the NPM Network Topology Map fields?

On both nodes I checked all of the routing options, and layer 2 and layer 3 topology. Sometimes it works and sometimes it doesn't.

What are the mechanics behind this?

When it doesn't work I can't connect the nodes together in Orion Network Atlas.

Any suggestions?

Node 'A'

Node 'B'

↧

Best MIBs to use for DMVPN tunnel status

June 24, 2010, 9:45 am

≫ Next: Monitoring Cisco IWAN

≪ Previous: Anyone else having issues getting NPM to map out the topology correctly?

I'm trying to find a MIB poller that will let me know the status of our DMVPN tunnels. The tunnel status is always up, so I was looking for MIBs that could give me the state of the crypto isakmp sa command. I found some isakmp MIBs in the universal device poller but they were not supported on the devices I was trying to use them on (3945s, 2811s). I also looked for any MIBs that would pull NHRP information and came up empty on that as well.

So, I'm looking for advice on what other have done to monitor their DMVPN tunnels.

Any input is much appreciated.

Thanks,

Tammie

↧

Monitoring Cisco IWAN

June 28, 2016, 4:06 pm

≫ Next: Easy way to monitor DMVPN tunnels!!

≪ Previous: Best MIBs to use for DMVPN tunnel status

Hello All,

We will be implementing Cisco IWAN in our company soon and i was wondering if Solarwinds has the ability to monitor IWAN?

So if the router were to failover to the IWAN we more than likely wont get notified because its such a seamless process we would like to know if a connectivity is degraded because we were never alerted.

I hope this makes sense any help will be greatly appreciated

Thanks

↧