Quantcast
Channel: THWACK: All Content - Network Performance Monitor
Viewing all 21870 articles
Browse latest View live

How to add additional interface to monitor to multiple existing nodes all at once

$
0
0

I have groups of like devices (no less than 100 of each which in total equal to 2000 devices) that I need to add the same interface to monitor.  When the devices were added, only one interface was added but now we realize we need the other also.  I am REALLY hoping I do not have to touch each one individually, and I have read some horror stories of people running a sonar discovery and having the nodes listed twice and I really don't want that even more.  I have NPM 11.5.2 currently.  Has anyone been able to find a way to do this?  And if you have can you please explain to me how you were able to?  Thanks!


Hardware requirements for additional polling engine using several modules?

$
0
0

Hello everyone, we are considering adding another additional polling engine to our SolarWinds environment, and we are re-evaluating our hardware needs for additional pollers. I am having a difficult time locating hardware requirements for this for more than just 1 particular module. We are running the following modules:

 

NPM 12.0

IPAM 4.3.2

SAM 6.2.4

DPA 10.0.1

NCM 7.5

IVIM 2.1.2

QoE 2.1.0

NetPath 1.0

NTA 4.2.0

UDT 3.2.4

 

We currently have 2 licenses per server, and all of our servers used for our entire SolarWinds environment are virtual servers if that matters. I would also like to confirm if 12,000 total elements per additional poller is the maximum number of total elements we should be allowing, and if there is lesser number than 12,000 that we should consider moving nodes to an additional polling engine. We currently do not have any nodes that we no longer need to monitor, and the polling rate on one of our additional pollers has been hovering at 60% with just under 9,000 total elements, so we are trying to be a little proactive before we reach 85% of its polling rate and have to start scrambling a little faster to come up with a proper resolution.

 

I apologize if I'm leaving out any pertinent information in this-- I am a SolarWinds newbie and at the very beginning of the learning curve here. If there is anything I left out please let me know and I'll be happy to provide that information. Thanks in advance!!

Possible bug?

$
0
0

Hi folks,

 

I've been having just the worst time ever trying to resolve my information service v3 issues. First, a .NET framework issue was identified and resolved. Second a broken IPAM view was fixed. Third an issue with netflow was resolved. The service has been uninstalled and reinstalled more than 20 times already between my own troubleshooting and troubleshooting from support. One by one the error messages started to clear from the Swis v3 log. But one issue remained. I noticed that if you give it a week slowly the cpu use and memory use creeps up but is never released. the log isn't huge like it once was. Went from having several created within a few hours to just one log created over a span of a week. And the size isn't really big at all.

 

No errors being observed any longer. But there are a few warnings that generate. Does not seem to be anything affecting performance. But does appear to have a resource leak where it takes up all the resources of my server.

 

Initially we had a build spec'd out at 8 cores at 2.4ghrz. Now we have 24 cores at 2.4ghrz. And all cores are at 50% use. With a total of 50% use of the entire CPU. When I check processes only process taking any CPU is Information Service V3 at between 40 and 60 percent. And 1 to 2 gigs of memory use.

 

Could this be a possible resource leak in the code that SolarWinds has not caught? Has anyone noticed information Service V3 in your environment using this amount of resources or more?

 

Thanks!

ASA failover monitoring and alert setup

$
0
0

I currently have a UnDP setup to monitor OID: 1.3.6.1.4.1.9.9.147.1.2.1.1.1.4

This is displayed on my node details page and looks like this:

However here is my first issue:

1) The above image was pulled from my device named pix-primary, this device I know for a fact is running as the standby pix as this has already failed over to pix-secondary. So in reading the OID output above it is telling me that "(this device)" is the Active unit, this is untrue as this PIX has already failed over to the secondary and pix-primary should be the standby unit now. Or am I reading this wrong?

2) If I setup an alert to email me when the status changes how would I setup the reset trigger? So status changes and an alert is sent, when the status changes back that too is a status change, so this alert will always trigger and never reset. Is that correct?

What Alerts Could Trigger on Node X?

$
0
0

Does anyone have a good method for determining all possible alerts a specified node could trigger? I have about 300 alerts, and I need to spot check certain nodes to see what alerts could be triggered.

 

For example, lets say I provide N:1234, I would want to see a list of alerts that particular node might trigger, including child elements (V, I, AM, AA, UnDPs, ...).

 

Has anyone written a report or tool that could do this for me?

False reboot alert for several Windows server VMs

$
0
0

Today Solarwinds alerted us that 11 VMs (each one of them belonging on separate ESXi 6.0 'free' hosts).  All the alerts were within 30 minutes or so of each other.

None of the VMs had actually rebooted.  We are monitoring these VM in NPM as Windows nodes, as well as monitor their respective ESXi hosts.

 

The ESXi host monitoring is very recent (I added them to NPM just a few days ago) so I wonder if that could be a factor.  There's no SNMP service running on the VMs.  They all run Windows server 2008 Ent.

Where is the DNS server that NPM uses?

$
0
0

In the nodes properties there is a DNS line. Where is that configuration setup? Our DNS has moved so it is coming up invalid now.

 

JR

Alert trigger where the value is evaluated against a custom property?

$
0
0

Hello,

 

I have been struggling to find an answer to this and am hoping someone might be able to help.   Within NPM I would like to create an alert that has a trigger evaluated against a custom property.   For example I have an alert built for a temperature sensor that looks something like this: "Custom Poller Current Stats - Current Value - is greater than - 80".  This works perfectly and triggers/resets the alert as appropriate when the temp gets higher/lower than 80.

 

Instead of coding in a specific value for the alert trigger I would like to reference a custom property.   I have tried both of these formats without success:

"Custom Poller Current Stats - Current Value - is greater than - ${TemperatureAlertWarningThreshold}"

"Custom Poller Current Stats - Current Value - is greater than - ${N=SwisEntity;M=Node.CustomProperties.TemperatureAlertWarningThreshold}

 

In either of the above two examples the alert immediately triggers as soon as I have saved it, regardless of what the current temp of the device is and what the custom property is set to.

 

Any assistance or advice would be appreciated.   My goal is to be able to be able to specify a custom warning/critical threshold for each device, without having to build separate alert rules for each one.   I was hoping using custom properties would allow me to specify these values on a per device bases while using only a single alert rule.

 

Thanks!
Scott


Orion Alert History Table Retention (Post NPM 11.5) (SOLVED)

$
0
0

EDIT (See Below for the Delete Query I Ran Manually):  Thanks to ctlswadmin's prompting, I checked my nightly maintenance log (swdebugmaintenance.log) and sure enough, the maintenance was timing out each night when it came time to work on the AlertHistory table.  I manually deleted all alerts older than 90 days from that table where their AlertActiveID was not in the AlertActive table and then manually ran a DB maintenance from the Orion server.  All's good now.  Thanks for all of the responses and help.

 

As confirmed by LadaVarga's response here, there is no way to set a retention period for the Alert History table.  And I specifically mean the table in the Orion database called AlertHistory, not the Active Alerts table in the web console.  We have a ton of alerts and our Alert History table is HUGE.  I can query it and get results from as long ago as 12/24/2015, which is a lot more than the 60 days our "Event Retention" setting is set at.  I bring up the Event Retention because in LadaVarga's reply I linked to she mentions that the AlertHistory table uses the Event Retention time, which doesn't seem to be true since our Events table only has the last 60 days worth of events whereas our AlertHistory table has 9 months of events.  I also find that 9 months is a pretty odd number of days since I don't see any setting in our Polling Settings where we have 270 days set as the retention period...

 

I see a Stored Procedure in the Orion database called dbm_AlertHistory_DeleteStale, but what process within the Orion software uses this procedure and how can we decide the datetime value to feed into this?  Because of the AlertHistory table's size, a lot of our Orion web console views that display historical alert information take a really long time to load.  Our SWISv3 logs are fully of "Query Took a Long Time to Execute" warnings and nearly every one of those queries causing the warning are against the AlertHistory table.

 

Is it safe for me to copy/paste the SQL from this stored procedure and manually run it with the datetime variable defined to a value of my choosing?  I'm thinking anything older than 90 days needs to go.  What effect will this have on the Alert Details views since they give a lot of historical info about the alerts, like how many times a particular object has triggered this alert for example?

 

 

Here is what I ran manually to get my AlertHistory table to a smaller size so that the nightly maintenance could start taking over again and not timing out on this table.  You can change the number of days to match what your Event Retention period is in your Polling Settings:

 

BEGIN  SET NOCOUNT ON;  DECLARE @EventHistoryRetention as int  DECLARE @ChunkSize as int  /* This number should be equal or greater than your Event Retention setting in Settings > Polling Settings */  SET @EventHistoryRetention = 90  /* How many records to delete at a time.  I recommend leaving this at 1,000 */  SET @ChunkSize = 1000  SET IMPLICIT_TRANSACTIONS OFF  SET ROWCOUNT @ChunkSize

 NextChunk:
  DELETE FROM  /* You can comment out the "DELETE FROM" line above and uncomment the "SELECT * FROM" line below to see the  first chunk that it would delete.  Good for verifying before running. */  --SELECT * FROM  AlertHistory Where [TimeStamp] < DATEADD(DAY,-@EventHistoryRetention, GETDATE()) AND AlertActiveID  NOT IN (SELECT AlertActiveID FROM AlertActive)  IF @@ROWCOUNT = @ChunkSize GOTO NextChunk

END

Custom MIB Use

$
0
0

So I have found a MIB for a device in the MIB browser but after I have selected the MIB I cant find anywhere to select OK / APPLY. Am I missing something here? Do I then need to poll the device for new MIB details? I'm new to this custom MIB malarkey and floundering around but getting there slowly

 

Any advise greatly appreciated.

Wireless AP no longer detected after 12.0.1 upgrade from 12.0

$
0
0

We have Avaya Wireless APs, which are OEM Xirrus units.  SInce the 12.0 upgrade we were receiving AP-type stats on each of the endpoint's home pages. However, since the upgrade to NPM 12.0.1 these seem to have disappeared.

 

Also a possibility a recent software upgrade from 7.5.4 to 8.1 on the APs may have caused this.

Caso de prueba

RHEL4 add to Solarwinds monitoring

$
0
0

Hi I have RHEL4 Server I want to add this to our NMS Solarwinds NPM. I am editing snmpd.conf file but unable to validate from my NMS NPM 10.4 please guide me what should I write in snmpd.comf file

I am using v2c and community string forexample

 

 

Please guide me in this

 

 

Thanks

Cisco ASA 9.6 VPN Pollers

High number of discards - Cisco 3850

$
0
0

We recently replaced several Cisco 2960s with 3850 stacks. After seting them up in Orion, we get several million transmit discards per hour on some ports on these switches, but not on any other models in our environment. "sh int" displays no errors. Anyone else seen this?

 

Orion Platform 2015.1.2, NCM 7.4, IPAM 4.3, NPM 11.5.2, IVIM 2.1.0, QoE 2.0, NTA 4.1.1

 

Cisco IOS Software, IOS-XE Software, Catalyst L3 Switch Software (CAT3K_CAA-UNIVERSALK9-M), Version 03.06.03E RELEASE SOFTWARE (fc3)


Network Atlas just says connecting and never opens

$
0
0

Hi Guys,

 

Network Atlas is simply saying connecting.  It never actually connects.  Everything else works great.  I have rebooted the server, reinstalled Network Atlas.  Has anyone ever encountered this before?

Planning a SolarWinds installation on Cisco UCS

$
0
0

I am preparing to install my SolarWinds suite of applications (NPM, NCM, NTA, SAM and IPAM) on a Cisco UCS platform. I have done this several times on traditional virtual/physical Windows servers, so I know how to scope the server CPU, memory and storage resources.

 

How do I translate resource planning for a Cisco UCS? Can I do it all on one UCS unit? What "formula" do I use to know how much RAM and CPU I will need?

 

Thanks for any experienced help.

OSPF Neighbor down Alert

$
0
0

We are using NPM 11.0.1 monitoring OSPF neighbor events on our Cisco equipment. Several times now this event was logged:

 

12/5/2014 2:22 AM  The neighbor 10.225.3.198 on Node ourhost.com went down.

 

When checking the switch for corresponding neighbor events on that date and time, none were found. Why is this alert being generated in Solarwinds when the L3 switch/router does not log any OSPF events?

 

Any help would be appreciated

 

Thanks!!

Prefer to install the Netpath probe agent

$
0
0

Hello Guys,

 

What's your prefer to install the Netpath probe agent? Is it good to install this on each SolarWinds servers? Need your feedback please? Thank you so much

NetPath is awesome! Except for linux support

$
0
0

Is there any way to get an agent installed on an Ubuntu server? I've found little to no documentation on this, and it seems like a huge problem for any enterprise environment running linux machines.

 

It did appear there was an option to install a linux agent via Admin -> Manage Agents -> Agents, but even though my server is listed as "Agent is running" Connection Status: "Connected", it never completes a poll.  I even removed the entire iptables firewall from the server for testing purposes.  No luck.

 

Anyone have a howto guide on getting this to work?

Viewing all 21870 articles
Browse latest View live


<script src="https://jsc.adskeeper.com/r/s/rssing.com.1596347.js" async> </script>