Node Uptime Report
Custom SQL Query to a Non-Solarwinds DB in the UI
Has anyone built a custom SQL query that uses a piece of Solarwind node data (ie IP address) to pull back data from an external DB? We're redesigning our node details page and I would love to be able to pull in the open incidents from our ServiceNow instance for the node in focus but I don't really have any idea where to start.
Anyone?
Using Your Custom HTML Resource To Properly Display SWQL Query Results
Is Orion application(NPM/NCM..) is compatible with linux OS ?
Hi All,
Do you have any idea about current availability of orion application like NPM/NCM/SAM or so on is compatible on Linux os.
If yes please share some information about same.
Thanks
Orion Platform 2018.2 Improvements - Chapter One
The time has come again for another exciting rundown of some of the improvements and enhancements coming your way in the next major installment of the Orion Platform. For those who may not be familiar, Orion is the foundational component upon which product modules such as Network Performance Monitor (NPM), Server & Application Monitor (SAM), and many others are built atop. Platform capabilities are available to, or can be leveraged by modules which run atop the Orion Platform. In most cases, those enhancements are available regardless of which Orion module(s) you are running, such as PerfStack. In others, it may be something which individual modules can extend to utilize for their own purposes, such as the Orion Agent which has been the basis for delivering amazing new capabilities from NetPath and QoE in NPM, to Application Dependency Mapping (ADM) and IaaS monitoring in SAM.
UPS Monitoring
Several years ago I created a Universal Device Poller (UnDP) for monitoring APC SmartUPS devices, and still to this day it remains amongst one of the most popularly downloaded UnDPs for NPM, if not the most popular. Universal Device Pollers are an incredibly powerful feature of NPM, allowing you to monitor virtually anything about a device which is managed via SNMP. However, there comes a time when certain functionality becomes so ubiquitous that it makes sense to promote it to native functionality of the monitoring solution and not require users to create it themselves. So in this 2018.2 release of the Orion Platform included with NPM 12.3, that's precisely what we set out to do, while also making some improvements along the way.
If you haven't already done so, you'll want to start by adding your APC UPS equipment to Orion. You can do so individually using the 'Add Node Wizard' [Settings > All Settings > Add Node], or in bulk using Sonar Discovery [Settings > All Settings > Discover Network]. If you are adding the devices using the 'Add Node Wizard', you will notice a new option listed for your APC UPS equipment entitled 'UPS'. Checking the box next to this option will enable UPS polling for this device.
List Resources | Power Control Unit Status Resource | UPS Firmware Version |
---|---|---|
![]() | ![]() | ![]() |
Once you've successfully completed the 'Add Node Wizard' and navigate to the 'Node Details' view of your newly added UPS device, you will notice a newly added resource entitled 'Power Control Unit Status'. This resource reflects the most important information about your UPS device, including things such as its overall status, time on battery, and the batteries current charge capacity. This information can, as you would expect, be utilized in Alerts to notify you things such as when the UPS is on Battery, if a battery needs replacing, or if the battery is reaching an unsafe operating temperature. You may also notice that the 'Software Version' field in the "Node Details' resource now accurately reflects the firmware version installed and running on the UPS.
Currently, this new capability is limited exclusively to APC (American Power Conversion) SmartUPS Uninterruptible Power Supplies (UPS) containing Network Management (AKA Web/SNMP) cards. This feature does not support APC's unmanaged BackUPS series, nor does it yet support other UPS vendors, such as Eaton, Tripp Lite, or CyberPower. At least for now, we recommend using the Universal Device Poller to monitor similar metrics for UPS vendors other than APC. We will, however, be keeping a close eye on the NPM feature request forum to gauge interest in native support for other UPS vendors.
Linux/Unix Load Average
In a similar vein to UPS monitoring discussed above, we learned from speaking with our customers over the years, as well as from those participating in the Orion Improvement Program, that monitoring Load Average on Linux and Unix systems ranks among the most popular uses of the Universal Device Poller. In our enduring pursuit to deliver unexpected simplicity to our customers, we realized that collecting these important metrics natively was something which was long overdue.
Beginning in Orion Platform 2018.2, and included with NPM 12.3, Load Average is collected automatically for any node which supports it. This is typically any Linux based operating system, but can also extend to FreeBSD, AIX, and other Unix like OS's. The Load Average metrics are collected for nodes monitored via the Orion Agent, as well as those managed agentlessly via SNMP. There's really no additional steps required if you added your nodes using the default selection. Since Load Average has a direct correlation to CPU utilization, it's intuitively tied to the existing 'CPU & Memory' option shown under 'List Resources'. When selected, Load Average statistics are collected automatically if the node being monitored supports them.
List Resources - CPU & Memory | Load Average Resource |
---|---|
![]() | ![]() |
On the 'Node Details' view of your Linux servers, you will notice a snazzy new resource entitled 'Load Average' which displays the one minute, five minute, and fifteen-minute load average of the machine being monitored. Because Load Average metrics are tightly coupled to the number of CPU cores, we extended Orion's alerting to allow you to combine Load Average statistics with CPU count within your Alert Trigger so you can be notified when your system is under strain.
Load Average has also been added to the default PerfStack metrics for the node, meaning if you click on the 'Performance Analysis' button on the "Management' Resource of the 'Node Details' view for Linux server, you'll be taken to PerfStack where these Load Average statistics are automatically prepopulated. Similarly. if you're already working in PerfStack you can drag the node itself onto the chart area, the Load Average statistics, as well as other default metrics for the node will populate the PerfStack dashboard.
Group Availability
Ever since bshopp introduced us to Orion Groups back in NPM 10.1, we've heard from many ofyou that the manner in which availability is calculated for these groups just didn't jive with how you think about availability in your environment, nor did it provide a valuable measurement for use in your SLAs. Sadly, Group Availability in Orion is calculated binarily. Put simply, the group is either 100% 'Up' or it's 100% 'Down' regardless of the number of members contained within the group. What this usually meant was, so long as at least one member in the group was 'Up', the availability of the group was 100%. That remained true even if there were 99 other things 'down' in that group at that time. I know, it sounds odd when you say it aloud or even when you're writing it down, but that's how it's been for years and somehow we've managed the muddle through. Well in this release of the Orion Platform, no longer will you be forced to just muddle through. Today we heed your cries!
Rather than turn the world on its end, causing lots of confusion and alerts storms in our wake, we left the legacy Group Availability metric in place, untouched. I know that will come as a big relief to those of you which have grown dependant upon this method of calculating availability and have built reports and alerts around this metric. What we chose to do instead is introduce a new Group metric entitled 'Group Members Availability', which as one would expect, properly and accurately calculates the availability of the group based on its members. This includes nested groups as well.
This new 'Group Members Availability' metric appears automatically on the 'Group Details' view upon group creation. We will also start calculating this new metric upon upgrade to Orion Platform 2018.2 if you already have existing groups. So there's really nothing you need to do. We even include a new out-of-the-box report we refer to as 'Members Based Group Availability Report - Last Month' which serves as an example for how easily this metric can be added to your own reports compared to some of the
some had attempted to use in the past. You can even leverage this new Group Members Availability metric in your alerting conditions with no fuss!
And More!
There's still plenty more we've managed to jam pack into this release of the Orion Platform that we're particularly excited about and would love to get your feedback on. Stay tuned to learn about some of the mapping improvements jblankjblank has whipped up and the many usability enhancements serena has crammed into this release, such as sexy new hovers, a new PerfStack widget, and additional improvements that we've made to ensure your next upgrade experience is great!
Unable to add single node (Vyatta firewall) to Solarwinds NPM
Please refer the attached screen shot
NPM 12.3 Orion 2018.2 Upgrade Feedback
What has your upgrade to NPM 12.3 on Orion Platform 2018.2 looked like? We on the product manager team would like to hear about it all, the good the bad and the ugly! For a starting point here is a quick getting started blog post on upgrading to 2018.2 Orion Platform: Preparing for the Upgrade to 2018.2
What We're Working on for NPM (Updated June 1st, 2018)
NPM 12.3 has shipped and we're hard at work building the next release. Here's what we're working on, in no particularly order.
- Cisco ACI Monitoring
- Remote Collector - New, agent based collector for distributed environments and hybrid deployments
- Next Generation Orion Mapping - First version delivered in NPM 12.3 via Orion Platform 2018.2. Working on the next version.
- Centralized Upgrades
- Website & Database Performance Improvements
- Windows Device Guard Support
- SAML Authentication Support
- Replace syslog/trap with the functionality from our new Log Manager product
CheckPoint firewall failover alert
Hi All,
Creating one document, that will help you on "how to create the checkpoint firewall fail-over alert"
> Create the one UDNP poller with the help of below OID and assign them on Primary and Secondary both checkpoint firewall.
> Create an alert with below condition.
Actually this alert will target only Primary checkpoint firewall, because when primary firewall failed over to secondary firewall, you will receive such alert on console,
Alert Sanpshot -
here we have target only one node, but if you have multiple primary node, you can create a new custom property against the node, like "Pri_CheckPoint" and marked them "YES" all primary node.
And then same logic you can use in above created alert, where "Node - custom property - Pri_Checkpoint - YES"
Now your alert is ready to fire alert, but in "Trigger action" tab you can use the correct Netperfmon event log setting, So you/your team can get the correct naming details of alert in console.
What happened to ${VolumePercentUsed} and ${VolumeDescription}
Morning.
I have generated a custom alert and looking to find the following two variables. I have an old alert from a little while back with the variables and copied it from there, but it didn't port correctly. So my plan was to re-add the volumepercentused, but can't seem to locate said variables. I tried ${N=SwisEntity;M=VolumePercentUsed}, but that hasn't worked either.
Has it gone forever, or is it something that has had a name change?
Which configurations are required for logging in to Huawei ONT HG8245Q2 through web pages?
Which configurations are required for logging in to Huawei ONT HG8245Q2 through web pages?
How to fetch CPU alerts from solarwinds Database
Hi All,
Could anyone tell me how to fetch CPU alerts from SolaWinds Database with SQL query.
Regards,
Rakesh
Solarwinds DB fragmentation error
Hello guys,
i am using Solarwind NPM 12.2 in my network, & i am getting warning event---" indexes with fragmentation over 90% found during DB maintainace". I am not getting this event daily but most of times get this.
As far as i know auto index defragmentation is there in this version which is enabled.
Need your help/suggestions to resolve out.
NPM Installation in a VMWare environment
My group is tasked with moving from a set of isolated enclave networks (basic description below - CURRENT) to a managed data center (basic description below - END-STATE)
CURRENT
- Our team directly manages the VMWare hosts, storage, virtual machines, and network support infrastructure
- Each enclave has its own IP address scheme (managed by us) as well as the secure connections to other specific sites (no internet access)
END-STATE
- We will not be able manage the VMWare hosts, the back-end storage, nor the VMWare network components
- We will manage the VMWare datastores we have been given access to, as well as the virtuals we build and manage.
- Each enclave has its own IP address scheme (provided by the data center personnel), but we will continue to manage the secure connections between other specific sites outside the data center.
Questions for anyone who might be listening :^)
- The book for Solarwinds components states that we need a dedicated NIC on the VMWare hosts for Solarwinds traffic, especially if you are running NTA. This is possible in our current environment, but will not be in our new one. Does anyone have any suggestions for effectively maintaining communications between polled objects and the server.
- Although each enclave is a very small, testing, environment (<10 servers in each), we are planning to expand our use of Solarwinds to include all of the following tools: NPM/NTA (already installed), SAM, NCM, and Kiwi SysLog. Would it be more effective to run everything on separate servers. or would one beefy server suffice?
NOTE: Each isolated network is separate from each other, so we will not have to worry about aggregating data from each enclave to a central management server. We just want to be able to monitor each enclave's status.
Thanks and let me know if anything is not clear in my question.
Thanks!
How do I update the MIB Database on an APE?
Okay, this should be fairly easy and straight forward right? You go out to the customer portal, download the latest MIB.cfg file and then replace the current MIB.cfg on the poller with the new one.
I've done all that and rebooted the server and I'm still getting an alert that the database on the APE is over 300 days old.
Am I doing something wrong or do I need to be more patient with SolarWinds to see the update?
NPM 12.2 Export to PDF format
Hi All,
When I trying to export the Interface utilization reports the PDF format is no longer an option, I can only seeing a printable version, Is there anyone can assist with the work around?
Do you have Meraki gear? Does Solarwinds do enough with it? What would you like to use Solarwinds for when managing Meraki?
We've adopted Meraki to replace ASA 5505's in SOHO solutions, and I'm replacing 60 of them now.
But I find insufficient Solarwinds support for backing up MX65's, Z3's, etc. Nor do I find excellent snmp-v3 information about monitoring Meraki gear with NPM.
I've searched Thwack for "meraki" and reviewed lots of hopeful, optimistic entries, plenty of requests for SW to better manage or integrate with Meraki gear. But I've found no answers to my needs. I can only discover my gear with plain snmp-v2, and I can't back it up at all with NCM.
Nor do I see a way to pull Netflow info from Meraki for NTA to display.
Do you use Netflow on your Meraki gear and have it successfully showing up in NTA? If so, how?
Have you found a way to back up Meraki gear with NCM? If so, how?
Have you successfully implemented snmp-v3 between Meraki and NPM? If so, how?
Your stories, answers, solutions, workarounds, frustrations, and suggestions are requested!
Using Your Custom HTML Resource To Properly Display SWQL Query Results
Troubleshooting the High Availability Module
Since the new HA module came out I once in a while have run into cases where my servers would be bouncing back and forth causing me some headaches. This thread is mostly just a place I wanted to write down some of what I saw and my thoughts so maybe it can be helpful to others who are struggling with their HA environments.
I dug through the database and the logs and was able to resolve the issues in my specific cases but I never came across a central "reason for failover" kind of value that was actually useful. Have I just missed it or does this not currently exist?
The situation I ran into most recently was that my primary poller was pretty overloaded while we shuffled some nodes around fixing an issue with an APE. Nothing was crashing but I suspected that since the system was slow to respond that was triggering the failovers, but the failover itself was WAY more disruptive to my work than it would have been to just let me finish what I was doing and move nodes back off the primary engine to their home APE. Eventually after several failovers interrupting me I went ahead and killed the HA pool so I could finish what I was doing and the system stayed up just fine while overloaded, just ran a bit slow until we finished. When I went to dig around and see if there was a conclusive indicator telling me that we had definitely breached some specific threshold or something I couldn't find one so it prompted me to post this thread.
I can get a few metrics that might be minimally useful from the HA_PoolsView such as when the pool last changed over but nothing pointing at why it triggered.
In most cases if I look at the HA_PoolMembersView you would think that StatusMessage might have something useful, but it is typically null.
HA_PoolMembers has columns called ReasonofFail and StatusMessage, but they are also null in most cases.
HA_Audit has events for each failover event letting me know the xx server is up or down, but still no indication of why.
Then we get into parsing the log files themselves.
\ProgramData\Application Data\SolarWinds\Logs\HighAvailability has two files, HighAvailability.Service and HighAvailability.KeepAlive. The keepalive file is generally empty when I've looked at it, and the service file is a little dense. By looking for "WARN" lines I was able to spot some cases where "Call to Monitor() timed out." so that told me the initial idea was probably correct.
I can see in the LOCAL POOL SNAPSHOT some intervals defined like so:
IntervalMemberDown: '00:00:32', IntervalPoolTask: '00:00:08', IntervalSuicideRule: 00:00:29
Is there anywhere you could tweak those thresholds to make the system maybe a bit more tolerant of situations where the primary poller is CPU bottlenecked for a bit? Like I mentioned earlier, a string of back and forth failovers is much more disruptive to operations than just having a slow console while I was in the middle of a big change.
I did come across an executable in the main Orion folder called HAEnableDisable, which probably could have been helpful. You need to launch it from the command line and it has flags for /info /disablepool /enablepool /disableha /enableha
Anyway, hope you guys find something useful in all that for future reference.
Loop1 Systems: SolarWinds Training and Professional Services
- LinkedIN: Loop1 Systems
- Facebook: Loop1 Systems
- Twitter: @Loop1Systems
Alerts on Multiple Conditions
We have a few different locations we connect through Site-To-Site VPN. We monitor some equipment at each site but if the VPN tunnel ever goes down, we get several alerts that all the equipment is down when it really isn't, it just can't be monitored across the VPN tunnel. What is the best way to set up alerts that they will only trigger if the conditions are met and the VPN tunnel is not down (we have objects that are just a ping across the tunnel to see if it's up or down)?
I know you can use complex conditions in alerting but to me, it only looks like you can set up the alert if multiple objects are triggered, not the other way around where one object is triggered and the other is not.
Any more information needed, please let me know.