Hi! I have an advanced alert which is triggering & the system's CPU Load status shows it at 100%. However, when we look at the server and the real-time process explorer, it's about 16%
I suspect the cpu load is being calculated using the delta (rawstatus) counter values instead of the new (total) values for these devices.
Current environment: Orion Platform 2013.2.1, SAM 6.0.2, IPAM 4.1, NCM 7.2.2, NPM 10.6.1, NTA 3.11.0, IVIM 1.9.0
Background: Installed NPM 10.6.1 from v 10.5 a few weeks ago. Upgrading to 10.7 isn't an option right now, without upgrading server to x64 & upgrading the sql server.
This is the only server with this error, although we have 5 other Linux oracle db servers.
Analysis: After reading
from 2009, I have this additional information:- Pollers are N.Cpu.SNMP.HrProcessorLoad. (I haven't played around with changing these.)
- Nodes are using snmpv2c
- Server has 4 CPUs
- I added UnDP from the article above for all of the oracle db servers on Linux. This collects the cpuRaw counters using a unit = blank and TimeFrame=None. It is storing the delta/difference value between polls in the status and rawstatus column and the new value is stored in the total column. All servers appear to be calculating the same (incorrect values) for the cpu load. The 100% server's ssCpuRawIdle value (1.3.6.1.4.1.2021.11.53) continues to retain the same value, so the delta value stored in the tables is 0. This delta of 0 is causing the calculation for CPU load to come out to 100% because it appears to be using the delta values for the calculations instead of the new value (total column in the table).
- When I view the UnDP (Universal Device Poller) values, especially the transforms of NetSNMP, they appear to be calculated correctly (using the totals) in the UnDP poller application. However, on other views, such as the web page for the node showing the poller values, it's showing the delta values (which are misleading). Secondary design issue.
Other articles:
July 2012, Feb 2010.
Are my suspicions correct? Do I have to 'turn off' the CPU monitoring for the Linux servers and handle these via a SAM monitor or has this been fixed in a later version?