Alert Not Displaying SWQL Correctly

October 16, 2018, 5:33 am

We just recently upgraded to NPM12.2, and one of our alerts is no longer displaying correctly. The intent is to send an email to a group with the Node name and the percent memory usage on the node. Our message is configured as follows:

--------------------------------------------------------------------------------------

The Memory on ${NodeName} is currently ${Node.PercentMemoryUsed}

--------------------------------------------------------------------------------------

We want the email to read:

--------------------------------------------------------------------------------------
The Memory on NODE is currently 93 %
--------------------------------------------------------------------------------------

Instead, it reads:

--------------------------------------------------------------------------------------
The Memory on NODE is currently ${Node.PercentMemoryUsed}
--------------------------------------------------------------------------------------

According to this article, Node.PercentMemoryUsed is a valid node property, but maybe I have the syntax wrong.

I am new to Solarwinds and SQL/SWQL, so any help would be greatly appreciated.

↧

Issue with alert raising

October 12, 2018, 6:10 am

≫ Next: Uplink port details

≪ Previous: Alert Not Displaying SWQL Correctly

Hello

I've created a map of my site. Each building is represented and if I click one of them I can focus on a list of servers hosted in the relative IT room.

When a server is down, the red status is raised and on my map, the "red" status is appearing.

But when a component of the server is faulted, the red status is not propagated to the main map.

How to raise a visual alarm on the hisghest level when a component of a server is faulted, not only when the whole server is down?

Thank you for helping

↧

Uplink port details

October 15, 2018, 9:56 pm

≫ Next: July .NET Patches and SolarWinds/SolarWinds Agents

≪ Previous: Issue with alert raising

Hi Guys,

i wanted know if i can get details of uplink port which are present on the switch using NPM or NTM.

↧

July .NET Patches and SolarWinds/SolarWinds Agents

August 7, 2018, 2:40 pm

≫ Next: What We're Working on for NPM (Updated June 1st, 2018)

≪ Previous: Uplink port details

After installing, and then uninstalling the July Microsoft patches around .NET Framework, we have been dealing with some serious instability in our environment. If you aren't familiar with the patches, they're documented here:

Advisory on July 2018 .NET Framework Updates · Issue #74 · dotnet/announcements · GitHub

Microsoft released these, we installed, they pulled them and then released another one to fix the issues that were found, but then said they did not think that it fixed everything on the 2008 R2 servers (we have two in our environment - one being the core Orion server, along with 9 2012 R2 servers). We have since uninstalled all of the patches from our environment, but still experience the issues.

The issues we are seeing is that the businesslayerhost process is crashing very often on our pollers, and we have a ton of apps (mostly the ones that monitor on our agent-based machines) going into an unknown state continuously throughout the day - about 1,000 out of the 8,000 total. The event log errors we are seeing are at the bottom of this email. My question is are you guys aware of these patches causing instability with SolarWinds? What about on the agent side? I know the agent relies on .NET framework, as it installs it during the installation process if it isn't already there. With the way that we are seeing the issues on our pollers, it almost makes me think that we are having issues communicating with the agents, thus causing the unknown app numbers to bounce around all day as the pollers are having trouble getting the data in time. I believe all of our agent-managed machines still have these patches, even though they are all 2012 R2 and up.

For reference, here is the version(s) we are at:

Errors:

Application: SolarWinds.BusinessLayerHost.exe

Framework Version: v4.0.30319

Description: The process was terminated due to an unhandled exception.

Exception Info: System.InvalidOperationException

at SolarWinds.BusinessLayerHost.BusinessLayerHostService+<>c__DisplayClass25_0.<CheckPlugins>b__0(System.Object)

at System.Threading.QueueUserWorkItemCallback.WaitCallback_Context(System.Object)

at System.Threading.ExecutionContext.RunInternal(System.Threading.ExecutionContext, System.Threading.ContextCallback, System.Object, Boolean)

at System.Threading.ExecutionContext.Run(System.Threading.ExecutionContext, System.Threading.ContextCallback, System.Object, Boolean)

at System.Threading.QueueUserWorkItemCallback.System.Threading.IThreadPoolWorkItem.ExecuteWorkItem()

at System.Threading.ThreadPoolWorkQueue.Dispatch()

at System.Threading._ThreadPoolWaitCallback.PerformWaitCallback()

Faulting application name: SolarWinds.BusinessLayerHost.exe, version: 2017.1.5300.1698, time stamp: 0x58ac4615

Faulting module name: KERNELBASE.dll, version: 6.3.9600.18938, time stamp: 0x5a7dd8a7

Exception code: 0xe0434352

Fault offset: 0x00015ef8

Faulting process id: 0x1d0c

Faulting application start time: 0x01d42e91b9959d28

Faulting application path: C:\Program Files (x86)\SolarWinds\Orion\SolarWinds.BusinessLayerHost.exe

Faulting module path: C:\WINDOWS\SYSTEM32\KERNELBASE.dll

Report Id: 00ec59ef-9a86-11e8-80fd-e4115bafdd78

Faulting package full name:

Faulting package-relative application ID:

↧

What We're Working on for NPM (Updated June 1st, 2018)

March 30, 2015, 1:55 pm

≫ Next: Deploying Virtual Appliances

≪ Previous: July .NET Patches and SolarWinds/SolarWinds Agents

NPM 12.3 has shipped and we're hard at work building the next release. Here's what we're working on, in no particularly order.

Cisco ACI Monitoring - See NPM Monitor Cisco ACI and Support of a Cisco ACI networks in Network Performance Monitor.
Remote Collector - New, agent based collector for distributed environments and hybrid deployments. See NPM Micro Polling Engine.
Next Generation Orion Mapping - First version delivered in NPM 12.3 via Orion Platform 2018.2. Working on the next version. See Network Atlas Update?, Network Atlas Overhaul, and Granular Link Speeds in Network Atlas.
Centralized Upgrades - See Easier Upgrades/Update/Hotfixes for Sites with Additional Pollers and Web Servers, and Modify Orion Platform to display all recommended hot fixes and Buddy Drops as alerts in the top of NPM's header when they're not already installed. Then make the upgrades happen with one click.
Website & Database Performance Improvements
Windows Device Guard Support
SAML Authentication Support - See Support SAML.
Replace syslog/trap with the functionality from our new Log Manager product - See One alerting engine.

↧

Deploying Virtual Appliances

July 5, 2012, 2:30 pm

≫ Next: 1-minute max to tell us about your IT troubleshooting pain points

≪ Previous: What We're Working on for NPM (Updated June 1st, 2018)

Are you comfortable with deploying virtual appliances?

We're discussing ways to deploy new products that may interact with your NPM (and other SolarWinds) deployments and one of our options is a virtual appliance.

↧

1-minute max to tell us about your IT troubleshooting pain points

December 24, 2014, 9:10 am

≫ Next: Automated port shutdown - what would be your reasons?

≪ Previous: Deploying Virtual Appliances

Tell us your biggest pain point when troubleshooting IT issues, e.g. users complaining about slow or broken access to applications.

Select your biggest pain point from this list.

Of course any additional comment to this page would be great (e.g. second biggest, pain point not listed below...)

↧

Automated port shutdown - what would be your reasons?

January 13, 2014, 1:00 pm

≫ Next: Cisco ASA as a Default gateway?

≪ Previous: 1-minute max to tell us about your IT troubleshooting pain points

There was a feature request for a UDT recently for an automated port shutdown, if the connected endpoint is on the "black list".

What would be your other reasons, for which would you want to shut down the port? Lets's go our imagination wild and imagine events like user device infected by a non-deletable virus, peer-to-peer software installed, firewall disabled? Feel free to vote for Other and leave comment. By liking the specific comments you can vote for other ideas.

Thanks!

↧

Cisco ASA as a Default gateway?

May 29, 2015, 1:40 am

≫ Next: Does your organization use Mobile Device Management or MDM?

≪ Previous: Automated port shutdown - what would be your reasons?

There has been over 1 million of Cisco ASAs built and deployed around the world in a past years (good job Cisco!). As not supporting SNMP protocol properly, they often complicate our lives when configured to be Default gateways, not allowing products like SolarWinds User Device Tracker to monitor what's behind. Do you use your ASAs as a Default gateways too? If so, what percentage of your network they route?

↧

Does your organization use Mobile Device Management or MDM?

March 27, 2013, 6:30 am

≫ Next: Do you need Network Access Control solution ?

≪ Previous: Cisco ASA as a Default gateway?

With the explosion of smart mobile devices, both tablets and phones and end users wanting to connect them for email, remote access (VPN), wireless etc. How are you handling this? Does your organization even see this as a problem? What level of control are you looking for? Do you care if they have apps like DropBox are installed on them and uploading corporate files without your knowledge?

When I talk about Mobile Device Management I mean specifically things like:

Remotely manage and setup end users mobile devices (i.e. Google Android or Apple iOS) with the settings for service such as email access, wireless, VPN etc.
Enforce corporate security standards and best practices such as ensuring a pass-code is set or disabling certain feature or functions on the device like the camera
Ability to remotely wipe a device if lost or stolen
Report on types of devices, the hardware and software installed on them in your environment
Track users or stolen devices via GPS

↧

Do you need Network Access Control solution ?

January 14, 2014, 3:15 am

≫ Next: Citrix Netscaler 12.1 multiple partitions monitoring

≪ Previous: Does your organization use Mobile Device Management or MDM?

I wonder how many of you already have some version of a Network Access Control (NAC) solution in place. Theoretically speaking, Mobile device management (MDM) is just a part of a bigger NAC, so I would like to ask you to vote "yes" even if you have only the MDM solution in your organization.. If voting for "We don't have one but plan to buy", could you please leave a comment on why is it important for you to have one?

↧

Citrix Netscaler 12.1 multiple partitions monitoring

October 16, 2018, 11:17 pm

≫ Next: Telco Systems’ T-Metro 7224 CPU, POWER LEVELS, VOLTAGE, TEMPERATURE MONITORING

≪ Previous: Do you need Network Access Control solution ?

Hello.

We need to monitor multiple partitions on our new Citrix Netscalers 59xx.

Can anyone say to me how can i do it?

As i understand i can't create more than one device with the same IP for monitoring in NPM, but the point of monitoring multiple partitions is to split it by the SNMP community only.

How can i create more than one device in NPM with different SNMP community?

Thank you.

↧

Telco Systems’ T-Metro 7224 CPU, POWER LEVELS, VOLTAGE, TEMPERATURE MONITORING

October 3, 2018, 1:47 am

≫ Next: Can´t save Map Group

≪ Previous: Citrix Netscaler 12.1 multiple partitions monitoring

I am using T-Metro 7224 switches from Telco Systems on my fiber network infrastructure, the problem is that I am not getting the CPU, memory, voltage, temperature and power levels information from these devices.

SNMP is working fine but is not getting CPU, memory, voltage, temperature and power levels information
I am using NPM version 12.3

↧

Can´t save Map Group

October 17, 2018, 2:40 am

≫ Next: Using Your Custom HTML Resource To Build A Better Way To Navigate Your Custom Views

≪ Previous: Telco Systems’ T-Metro 7224 CPU, POWER LEVELS, VOLTAGE, TEMPERATURE MONITORING

I created a group with all my device.

When I navigate to Home --> Groups I select the group and in the left pane a click on Map and I can see this Map:

I didn´t create the map, It showed up automatically.

The problem is I am trying to move the devices to the correct position (the network topology is a ring) and then save it. The only option is saving as a new group so I save it with a new name.

The new group is created but when I come back to see the map it appears again disorganized.

How can I save the map?

↧

Using Your Custom HTML Resource To Build A Better Way To Navigate Your Custom Views

September 25, 2018, 7:50 am

≫ Next: Checking Cisco switches for SSH capability

≪ Previous: Can´t save Map Group

↧

Checking Cisco switches for SSH capability

July 7, 2017, 7:58 pm

≫ Next: Issue with alert raising

≪ Previous: Using Your Custom HTML Resource To Build A Better Way To Navigate Your Custom Views

I have over 500 Cisco Switches all different models that i need to check and see if they are SSH compatible. That is if they will support SSH or not. I'm currently using telnet and need to move to SSH. Thanks.

↧

Issue with alert raising

October 12, 2018, 6:10 am

≫ Next: Multiple IP Addresses for Border Devices.

≪ Previous: Checking Cisco switches for SSH capability

Hello

I've created a map of my site. Each building is represented and if I click one of them I can focus on a list of servers hosted in the relative IT room.

When a server is down, the red status is raised and on my map, the "red" status is appearing.

But when a component of the server is faulted, the red status is not propagated to the main map.

How to raise a visual alarm on the hisghest level when a component of a server is faulted, not only when the whole server is down?

Thank you for helping

↧

Multiple IP Addresses for Border Devices.

October 17, 2018, 7:15 am

≫ Next: Using Your Custom HTML Resource To View Events On A Timeline

≪ Previous: Issue with alert raising

Dear colleagues,

What workaround or workflow you use for monitoring border devices with mutiple IPs?

In my case: we have 2+ channels for each branch (main + 1+ reserve, each of them can be used as main in case of some issue on Internet provider side) and I want to have alerts about down device only when all channels/IPs are unavailable.

I've seen the post in KB Multiple IP Addresses for One Node - SolarWinds Worldwide, LLC. Help and Support, it's nice, but from my point of view, more modern and flexible is to have a possibility to add more than 1 IP address for a device.

↧

Using Your Custom HTML Resource To View Events On A Timeline

October 10, 2018, 11:19 am

≫ Next: Alert for response time - use average or current?

≪ Previous: Multiple IP Addresses for Border Devices.

Well, here we are, again, with another example of how we can view the same old boring data via a SWQL query and a little bit of JavaScript.

Previously, I posted an example of how to build an events calendar, which would populate some pie charts once you clicked a date on the calendar. Using Your Custom HTML Resource To Properly Display SWQL Query Results

Then, there was the post showing how to rebuild the manage views page, making it easier to navigate and manage your various viewgroups. Using Your Custom HTML Resource To Build A Better Way To Navigate Your Custom Views

Now, if all goes well, this post should show you an example of how you can view your events across a timeline. While this will work for any event with a start and end time, I am going to specifically use the NCM Scheduled Jobs for the example data.

More information on the timeline being used in this example can be found here: Charts | Google Developers

Just the same as previous versions of similar tools, mblackburn has done most of the legwork to get a fairly decent template built out, allowing me to "plug and play" this code to display our data in different ways.

ESTIMATED TIME TO INSTALL/PERFORM MODIFICATION:<5 Minutes

DIFFICULTY LEVEL:1-Youngling

Youngling(Easiest/Most Basic; no coding experience required, no config wizard required, no system restart required, no system downtime.)
Padawan (Easy/Basic; no coding experience required, possible config wizard required, possible system/services restart required, limited/no downtime.)
Jedi Knight (Moderately Difficult/Advanced; some coding experience required/recommended, config wizard required, possible system/services restart required, limited/short duration downtime.)
Jedi Master (Most Difficult/Advanced; advanced coding experience required, config wizard required, system/services restarts required, 30+ minutes downtime/maintenance window recommended, and other things that I do not even know I would need to know, required...)

This mod was performedon the following SolarWinds environment/versions: (It may, or may not work on other versions)

WHAT DO YOU NEED?

Access to manage views in your Orion environment
Orion web server must be able to access "https://www.gstatic.com/charts/loader.js"
The "Custom HTML" resource added to a view
A working method to copy text from the attached file
A working method to paste text, copied from #4 above, into a custom HTML resource, from #3 above.

Basically, you should only need to open the attached file (JS_Timeline-003.txt) in a text editor, copy the contents, and paste them into the "Custom HTML Resource" on one of your views within your SolarWinds environment.

Before we begin, (while the following is certainly a good practice, it actually doesn't apply to this customization.)

PLEASE don't edit the system files/database without backing them up first.

If you see a friend or co-worker making changes without backing up first, please alert the authorities.

Friends don't let friends mod without backups.

"If it's not broke, then fix it until it is."

-The smartest person ever

In The Beginning:

As with most things, we need to start somewhere. That somewhere is the default NCM "Jobs List" page. (/Orion/NCM/Resources/Jobs/JobsList.aspx)

Here is what our example data looks like in its default form. Pretty simple.

While there are many things I would like to see on this page (better/static filtering, independent "Last Date Start", "Current Duration", "Min Duration", "Max Duration", and "Avg Duration" columns, as well as a handful of other basic/standard data to assist with the overall system management), this post is really only going to focus on the basic DAILY view.

I say DAILY as this method is fairly simplistic, and is NOT very good at accommodating for anything out of the ordinary. In other words, if you are manually starting/stopping your NCM jobs, then you might see some odd looking data on the timeline. As far as I can tell, there is not an easy way to pull the basic data. For instance, I would consider the start time, end time, and next start time to exist for easy consumption. In reality, however, we only have easy access to end time, which is actually labeled as "LastDateRun", and the next start time, which is labeled as "NextDateRunUtc". Unfortunately, all of the job schedule data is stored in an XML formatted column of the Cirrus.NCM_NCMJobsView table, which I do not know how to easily access and parse within the scope of this example.

Having said that, we should still be able to get the previous start time, by manipulating the next start time and previous end time with a few date thingies, a couple of number thingies, and a pinch of "I hope nothing changes"... Needless to say, I think this should work for the most part, but just know there may be issues with displaying some data, depending on some wonky date calculations.

Okay, let's get back on track here...

The SWQL Query: (Without the JavaScript)

Before we get into the JavaScript, let's make sure we are able to see the data we want/expect to see. We already know what the default jobs list page shows us, so we are going to build this query to show us the important data, and get rid of everything else.

The full SWQL Query can be found at the following link: (SWQL Query To Display Basic NCM Scheduled Job Stats (Job Start, End, Duration, Next Start))

Here is what the results of our SWQL query would look like:

Nothing fancy, but we can clearly see all of the enabled jobs (and none of the jobs which are disabled), when each job started (or so we think... this is where the magic calculations begin), when each job ended, how long each job ran to complete, and a brief summary of when each job will again. Being that the query depends on the difference between the next run date and last end date, manually running the job would alter that time frame, which would produce incorrect data. (A problem to solve at another time?)

The JavaScript:

Now that we are able to see the data we need, we can dump it into some JavaScript stuff, and hopefully produce a decent looking timeline.

Again, this timeline is NOT perfect... at all... but it should provide a decent way to visualize which jobs are taking long, or which are overlapping with other jobs.

A few things to note:

The 2 jobs on the far left of the screenshot only run once per week. (Just a heads up as to why they were not invited to join the party with the rest of the data)
The timeline on the bottom repeats 12AM/6/12PM/6/etc. because, again, the 2 jobs on the far left only run once per week, which happens to be 3 days ago. (When the jobs run closer together, the timeline will automatically become more granular.)
The timeline does not display full datetimes in the hover over pop-up box.
The "Duration" value, within the timeline pop-up, comes from the way the chart processes the start and end dates. (The SWQL query inside the JavaScript is a slightly modified version of the query mentioned above. The query above calculates and formats the value for duration differently).

**I have updated the attached file to include a temporary workaround for displaying jobs which were manually started. Manually starting a job will cause the new start time to appear as a date/time in the future, which will break the chart. If/when the new time is set as a future date, later than the end time, this workaround will simply use the last end time as both the start and end time. This will allow the job to be added to the chart at the time it completed running. The next time the job runs at its normal schedule, it should show up with its normal start and end times... Hopefully**

Well, there you go. It's not rocket surgery, or anything fancy. But, when the planets are aligned, and your luck is full, it just might work well enough for you to use once or twice.

What's in your widget? Please post below and let us know.

For more ways to customize your SolarWinds environment, make sure to check out this link, by CourtesyIT

How to do various customizations with your Solarwinds

Thank you,

-Will

↧

Alert for response time - use average or current?

June 6, 2014, 2:54 pm

≫ Next: SQL block, lock, deadlock - System Down

≪ Previous: Using Your Custom HTML Resource To View Events On A Timeline

Working on an alert for response time using NPM v10.7. If I am correct, the warning and critical response time thresholds that I can set on the Node edit properties page are related to "current" response time. If my trigger condition is pointed to the variable Response Time and the Trigger must be sustained for 4 minutes (2 polls), why am I not seeing any alerts when I have a node that has been above the critical threshold for several polls in a row?

↧