Quantcast
Channel: THWACK: All Content - Network Performance Monitor
Viewing all 21870 articles
Browse latest View live

HA ip calculation

$
0
0

Hi,

 

On e of our customer has for the time being a SW server with an IP address like x.y.z.131, the gateway is x.y.z.1.

 

The customer wish to setup the HA mode and wish also to keep the current IP address as the HA vip address.

Based on SW doc we have to convert the IP address to binary and compare to the gateway IP address.

Unfortunalty in our case

the gateway is

11000000101010000000101000000001

the VIP address is

11000000101010000000101010000011

we only match on the first 24 bits.

 

What could be the valid ip address in order to keep th VIP address the polling address ?

 

Cheers


CPU and Memory for Cisco Switch Stack

$
0
0

I am not able to view CPU and Memory for the switch stack members. I get the below information:

 

I checked the "Learn more" article and found the below OIDs mentioned in the article:

 

After which I created the pollers in UnDP and added the 2 OIDs:

But still I am not getting the information related to CPU and Memory? Can anyone suggest how can I proceed with this?

July .NET Patches and SolarWinds/SolarWinds Agents

$
0
0

After installing, and then uninstalling the July Microsoft patches around .NET Framework, we have been dealing with some serious instability in our environment.  If you aren't familiar with the patches, they're documented here:

Advisory on July 2018 .NET Framework Updates · Issue #74 · dotnet/announcements · GitHub

 

Microsoft released these, we installed, they pulled them and then released another one to fix the issues that were found, but then said they did not think that it fixed everything on the 2008 R2 servers (we have two in our environment - one being the core Orion server, along with 9 2012 R2 servers).  We have since uninstalled all of the patches from our environment, but still experience the issues.

 

The issues we are seeing is that the businesslayerhost process is crashing very often on our pollers, and we have a ton of apps (mostly the ones that monitor on our agent-based machines) going into an unknown state continuously throughout the day - about 1,000 out of the 8,000 total.  The event log errors we are seeing are at the bottom of this email.  My question is are you guys aware of these patches causing instability with SolarWinds?  What about on the agent side?  I know the agent relies on .NET framework, as it installs it during the installation process if it isn't already there.  With the way that we are seeing the issues on our pollers, it almost makes me think that we are having issues communicating with the agents, thus causing the unknown app numbers to bounce around all day as the pollers are having trouble getting the data in time.  I believe all of our agent-managed machines still have these patches, even though they are all 2012 R2 and up.

 

For reference, here is the version(s) we are at:

 

Errors:

Application: SolarWinds.BusinessLayerHost.exe

Framework Version: v4.0.30319

Description: The process was terminated due to an unhandled exception.

Exception Info: System.InvalidOperationException

   at SolarWinds.BusinessLayerHost.BusinessLayerHostService+<>c__DisplayClass25_0.<CheckPlugins>b__0(System.Object)

   at System.Threading.QueueUserWorkItemCallback.WaitCallback_Context(System.Object)

   at System.Threading.ExecutionContext.RunInternal(System.Threading.ExecutionContext, System.Threading.ContextCallback, System.Object, Boolean)

   at System.Threading.ExecutionContext.Run(System.Threading.ExecutionContext, System.Threading.ContextCallback, System.Object, Boolean)

   at System.Threading.QueueUserWorkItemCallback.System.Threading.IThreadPoolWorkItem.ExecuteWorkItem()

   at System.Threading.ThreadPoolWorkQueue.Dispatch()

   at System.Threading._ThreadPoolWaitCallback.PerformWaitCallback()

 

Faulting application name: SolarWinds.BusinessLayerHost.exe, version: 2017.1.5300.1698, time stamp: 0x58ac4615

Faulting module name: KERNELBASE.dll, version: 6.3.9600.18938, time stamp: 0x5a7dd8a7

Exception code: 0xe0434352

Fault offset: 0x00015ef8

Faulting process id: 0x1d0c

Faulting application start time: 0x01d42e91b9959d28

Faulting application path: C:\Program Files (x86)\SolarWinds\Orion\SolarWinds.BusinessLayerHost.exe

Faulting module path: C:\WINDOWS\SYSTEM32\KERNELBASE.dll

Report Id: 00ec59ef-9a86-11e8-80fd-e4115bafdd78

Faulting package full name:

Faulting package-relative application ID:

SQL AlwaysOn High availability " Not Synchronization/Critical" state monitoring

$
0
0

Hi All,

 

I wondering to create an alert for SQL AlwaysOn High availability " Not Synchronization" state monitoring. Can you please help me out complete the same?

SQL errors in the logs.

$
0
0

Hi all,

 

Question: I'm seeing a lot of errors related to an item with the same key already exist or a prinary key couldn't be updated due to duplicates and things of this nature. What's the best way to find the cause and get it to stop.

 

Here is an exmaple:

Violation of PRIMARY KEY constraint 'PK__#DownTim__913A95523B22EC15'. Cannot insert duplicate key in object 'dbo.#DownTimeEntitiesToMerge'. The duplicate key value is (Jun 26 2018 3:31PM, 10645, Orion.ADM.NodeInventory).

 

Thanks,

leandro

Do you have any Palo Alto firewalls? Fill out this survey for up to 3,000 points!

$
0
0

Hello THWACKers!

 

The User Experience (UX) is doing some investigation into Palo Alto firewalls. We're interested in learning a bit about your current Palo Alto firewalls, and what tool(s) you're using to monitor/manage them.

 

For filling out this quick 10-minute survey, you'll get 500 points. As an added bonus, sending over examples of your Palo Alto configurations will get you up to 2,500 more points to use in the THWACK® store!

 

Click here to take our survey!

Port Security Violation Alerting

$
0
0

I've just added port security configuration to the interfaces on our Cisco switches. Can anyone explain the best way to create alerts to view any port security violation.

 

Would the best way be to create a poller with the MIB for port security? Then would I then need to monitor the port with UDT or NPM? Would this be enough to see the alerts for port security violation, or do I also need to create an alert to view the error disabled ports easily.

 

 

 

NPM 12.3 - Unable to install Additional Polling Engine due to Patch Manager 2.1.5?

$
0
0

Trying to replace an NPM 12.3 additional polling engine.  The primary polling engine was reinstalled last week on a fresh Windows Server 2016 box.  Just finished updating a fresh Windows Server 2016 server for the APE.

 

I downloaded the poller installer from /Orion/Admin/Details/Engines.aspx.  When I run the installer, the Install Report will not allow me to continue.

 

DESCRIPTION    The following products on the primary Orion server cannot be installed: PM 2.1.5

RESOLUTION     Update the products on your primary Orion server to the latest versions

 

PM 2.1.5 is the latest version available to download.  The primary Orion server is running the PM web console only.  The actual PM server is entirely separate.

Install Report error for PM 2.1.5

Install Report error for PM 2.1.5


Information from Solarwinds Database

$
0
0

Hello,

 

I would like to retrieve the amount of Microsoft licenses that are out there in our Server Infrastructure.

 

Would any of you good people know what database table holds this kind of information?  I found tables called 'Licensing_LicenseAssignments' which gave me license information but it was only for the polling engine.

 

Thanks

Anita Roberts

Create Node Alert

$
0
0

Hi
I want to create an 3 alert :

Condition

 

1 . Gateway Down

Gateway = Down

Switch = Up

Access Point = Up

Send only Alert Gateway

----------------------------

2. Switch Down

Gateway = Up

Switch = Down

Access Point = Down

Send only Alert Switch

-----------------------------

3. Access Point Down

Gateway = Up

Switch = Up

Access Point = Down

Send only Alert Access Point

 

im newbie
Any help would be greatly appreciated.

 

Thanks

Abay

What We're Working on for NPM (Updated June 1st, 2018)

$
0
0

NPM 12.3 has shipped and we're hard at work building the next release.  Here's what we're working on, in no particularly order.

 

  • Cisco ACI Monitoring
  • Remote Collector - New, agent based collector for distributed environments and hybrid deployments
  • Next Generation Orion Mapping - First version delivered in NPM 12.3 via Orion Platform 2018.2.  Working on the next version.
  • Centralized Upgrades
  • Website & Database Performance Improvements
  • Windows Device Guard Support
  • SAML Authentication Support
  • Replace syslog/trap with the functionality from our new Log Manager product

Muting alerts via the Orion API

$
0
0

I am currently working on a python script to mute alerts for a specific node using the orionsdk.

 

swis.invoke('Orion.AlertSuppression', 'SuppressAlerts', uri, start, end)

 

After running my script I can see the following in the Solarwinds audit log, but the node continues to alert, has anyone experienced this issue?

 

User suppress changed the schedule for muting alerts on opskzlp333  to 7/28/2017 1:15:00 PM - 7/28/2017 2:12:00 PM.

Route Change Alerting

$
0
0

Hi Team,

Has anyone been successful at doing alerts with route changes? like when a route got withdrawn from the RIB? other than route flaps. Thanks!

Regards,
Tim

Node Downtime with Duration and Minimum Length Filtering

$
0
0

**REQUIRES ORION PLATFORM 2018.2 OR ABOVE**

 

I had assembled this based on a much older SQL report, and then updated it to SWQL, then added some more intelligence to it so you can filter it based on the duration of the outage, search by the device names, and it has a method of letting you know when nodes have been down so long they aged out of the events table.

 

Based on popular requests I figured it was time to put it out here to make it easier for the Thwackers to find and use.  This is intended to be used inside the Custom Query Resource


 

select n.caption as [Device]
-- shows the current status icon
, '/Orion/images/StatusIcons/Small-' + n.StatusIcon AS [_IconFor_Device]
-- makes a clickable link to the node details
, n.DetailsUrl as [_linkfor_Device]
-- shows the timestamp of the down event, if there is no timestamp then is says the event was greater than the number of days in your event retention settings
, isnull(tostring(t2.[Down Event]),concat('Greater than ',(SELECT CurrentValue FROM Orion.Settings where settingid='SWNetPerfMon-Settings-Retain Events'),' days ago')) as [Down Event]
-- shows the timestamp of the up event, unless the object is still down
, isnull(tostring(t2.[Up Event]),'Still Down') as [Up Event]
-- figures out the minutes between the down and up events, if the object is still down it counts from the down event to now, displays 99999 if we cannot accurately determine the original downtime, and 
, isnull(MINUTEDIFF(t2.[Down Event], isnull(t2.[Up Event],GETUTCDATE())),99999) as Minutes


from orion.nodes n
left join (SELECT    
 -- Device nodeid used for our join   
 StartTime.Nodes.NodeID     

 -- Down Event time stamp in local time zone    
 ,ToLocal(StartTime.EventTime) AS [Down Event]      
 -- Up Event time stamp in local time zone    
 ,(SELECT TOP 1    
 ToLocal(EventTime) AS [EventTime]    
 FROM Orion.Events AS [EndTime]    
-- picks the first up event that is newer than the down event for this node
 WHERE EndTime.EventTime >= StartTime.EventTime   
-- EventType 5 is a node up 
 AND EndTime.EventType = 5    
 AND EndTime.NetObjectID = StartTime.NetObjectID    
 AND EventTime IS NOT NULL    
 ORDER BY EndTime.EventTime    
 ) AS [Up Event]      
-- This is the table we are querying    
FROM Orion.Events StartTime      
-- EventType 1 is a node down
WHERE StartTime.EventType = 1        
) t2 on n.NodeID = t2.nodeid


-- this is how I catch nodes that are down but have aged out of the events table
where (n.status = 2 or t2.nodeid is not null)


-- If you want to filter the results to only show outages of a minimum duration uncomment the below line
--and MINUTEDIFF(isnull(t2.[Down Event],(GETUTCDATE()-30)), isnull(t2.[Up Event],GETUTCDATE())) >  60


-- if you want to use this query in a search box of the Custom Query resource uncomment the below line
--and n.Caption like '%${SEARCH_STRING}%'


order by t2.[down event] desc

 

-Marc Netterfield

    Loop1 Systems: SolarWinds Training and Professional Services

HA ip calculation

$
0
0

Hi,

 

On e of our customer has for the time being a SW server with an IP address like x.y.z.131, the gateway is x.y.z.1.

 

The customer wish to setup the HA mode and wish also to keep the current IP address as the HA vip address.

Based on SW doc we have to convert the IP address to binary and compare to the gateway IP address.

Unfortunalty in our case

the gateway is

11000000101010000000101000000001

the VIP address is

11000000101010000000101010000011

we only match on the first 24 bits.

 

What could be the valid ip address in order to keep th VIP address the polling address ?

 

Cheers


Are your Orion server and SQL database server in the same Active Directory domain?

All Groups with count of members in title

$
0
0

All,

 

I saw a post for a feature enhancement where the groups names would include a count of the members in paran's. The requester had seemed to be able to do this with SWQL, but my swql skills are limited, I'm wondering if anyone has seen this or has a solution to provide an output that looks similar to this:

 

 

This would be really useful in our environment, as a quick reference to the node levels within the groups.

I was referencing this thread in Thwack when I came across it: Display Count of Members Per Group

but the example SWQL was never posted...

 

Appreciate any help on this.

Thanks

SteveT

Can someone help me with database waits? I have trouble tracking potential weakness in our db.

$
0
0

Hi,

 

Parallelism is a setting on SQL we've been messing with for a while.  No matter which option we pick we always end up with cxpacket wait times. And sometimes they hold up as long as 10 seconds or greater indicating potential for delays in execution. Also, Writelog wait time is one that is constantly high. Not sure if queries are suspended to long waiting for resources. But trying to determine how to better test the endurance and performance of my database so I can tweak it's weaks spots and bring it up to optimal performance.

 

It's SQL 2014 with the latest updates.

Windows 2012 R2 Datacenter edition.

24 vCPU's

100Gig of ram

On a brand new low use UCS host. (Database is virtual)

Connected to a VMAX60 EMC SAN appliance. VMDK's optimized for the fast policy. And the LUN is primarily all SSD's with a small portion on 10k SAS spindles. 30gig Fiber Channel to the SAN and LUN is not shared with any high resource application and storage guys tell me there is barley anything happening on these disks other than our database.

 

When database maintenance runs, or if your running diagnostic reports we see the tasks waiting shoot up to 20 thousand tasks stays there for about 1 second and clears down to around 20 to 40 tasks holding which is normal for our databsae. I know the database is as fast as it's weakest link and I'm trying to determine how can I test it to find out it's weakest link. I get the feeling something isn't right but I don't know enough to accurately troubleshoot and test this and need help.

 

Any help is appreciated. 

Question regarding organization and load balancing.

$
0
0

Hi,

 

I'm trying to figure out a way to better organizing our SolarWinds. I've made a feature request I'm trying to move some momentum behind for an auto load balance feature. For example, in discoveries. If you don't actively select the poller to which you want to discover too, all discoveries go to the primary as it is the default. If we have a new guy not familiar  in solarwinds, or someone else assisting (Which happened to us), you end up with an overwhelmed primary and additional pollers that have no devices assign to them. If we have an option to choose between manual and auto with auto being default. The system could run a process similar to database maintenance where it decides based on load numbers which device would go where. This way you never have to worry about a overwhelmed poller again.

 

But I'm here to ask anyone reading this how do you organize your solarwinds or load balance it across the pollers with out so much manual work? Is there a way to accomplish this? If the feature gets considered that would be ashmed and it would end this question but since the feature isn't developed yet I need to ask and get ideas and opinions.

 

I'm drowning in master sheets and spreadsheets. And the numbers are more likely to get skewed due to the number of spreadsheets I have to swim through. Then upper management is on neck about variances in numbers and what can we do to always have accurate counts and readings of solarwinds health.

 

Secondly, I'm curious to know if there is a way to develop a report that would give me an across count of devices in ICMP mode. There is always a difference between unknown count and icmp count even though technically they are unknown and not snmp devices. Then come to find there are snmp devices and icmp devices that aren't unknown but are in the icmp category. This is the point I start seeing doubles and have to give up trying to get an accurate report. I've created and edited several of them. But none give me the expected output. And the variance in numbers is way to big.  Any help here would be great.

 

Lastly, if I have a switch for example that is part of a name change project, but the ip stays the same, will solarwinds automatically scan and change the name accordingly or does it stay the same. I lean towards there being a change but am not sure.

 

 

Appreciate any help in advanced. Thank you!

Any know how to get a better handle on UCS monitoring?

$
0
0

Hello!

 

So I'm the single administrator here where I work and I've been trying to tackle this one till I burned myself out thinking about it day and night for weeks on end. Took a long break and now I'm burning my brain again and still not getting much done.

 

1. We have a need to monitor UCS. But UCS monitoring seems limited from NPM perspective.

2. When we add an FI, we can see all of the information but it isn't adding up. For example, in the UCS object of the nodes detail where you can see the chassis and blades, any blade you pick redirects you to the KVM on that infrastructure. Really what we are trying to accomplish is map it so that when we click on a blade it takes us to device details and node information instead of redirecting to the KVM. I understand that the chassis and baldes connect to the KVM and it's how they are controlled. However, there has to be a way to either have solarwinds go through the KVM and get us the blade details or find a way to skip the KVM and map to the device.

3. We want to monitor faults and failures , service profiles, network lentency, problems between FI and network. etc. We want to deep dive into UCS monitoring.

4. Call to solarwinds was uneventful. I was just provided with a KB article and the tech while he attempted to help even he couldn't give me more information than the KB provided.

 

I just get the feeling that SolarWinds placed this just to be able to say they do UCS monitoring but they never expanded on this capability to perfect the monitoring around it.

 

But I'm happy to hear what the community can offer in terms of advice for UCS monitoring and how we could reach our goals or get as close as possible to what we are looking for.

 

Thanks in advance for any help.

Viewing all 21870 articles
Browse latest View live