Hello,
I work on a NOC dept of 4 ppl (1 manager, 3 employees) that covers 7221 elements (1135 nodes). When I started in my current position I was told our dept was given the current Orion environment already setup and since I have tried to help tweak things to make our day to day job easier. But I always wonder if there is a better way. Let me take a moment to describe our process. Here are our current versions..I plan to upgrade those behind soon - Orion Platform 2014.1.0, IPAM 4.1, NCM 7.3, NPM 10.7, NTA 4.0.1
Our "Network Summary Home" page is what is used as our primary tool to see the overall network status. Emailed alerts are usually what prompts us to look at the screen and act on outages. On the home page we have the following modules:
Column 1
Hardware Health Overview
All Nodes - Collapsed listing State, then City, then node
Top XX nodes by avg cpu load
Column 2
Nodes with Problems
Search Nodes
Interfaces with High utilization
Top 60 Nodes by avg response time
We use the tabs on the left side for various things like events, various top 25, link, maps, etc
----------
Our IT dept has it's own ticket system which we use for outages and saturation issues.
Our NOC keeps track of current outages in a google spreadsheet that we email most of IT every 2 hours to keep them informed. It lists Node or office name, staff count, group responsible for outage and issue description, detailed and time stamped description of steps taken that mirrors info in the associated ticket, ticket number, Date/time of initial outage, date/time of resolution. And we archive these spreadsheets and use them to pull stats off of occasionally.
Our NOC priorities are outages(which involved troubleshooting directly with ISPs and offices), saturation issues and then projects.
--------
My manager and myself have done what can to make this process as streamlined as possible while still giving his manager what he wants and the needs and what we monitor are pretty dynamic.
We were just prompted to license our products with Solarwinds so naturally I'm asking myself am I using the best tools for our needs?
I would love it if I could stop using google docs and use something built into orion for one. This way maybe I could also automate the process of emailing managers every day every 2 hours.
But the biggest change I would love if ever possible is to list right on our home page of the console the current open alerts (Which we do using Nodes with Problems), but have a column next to it showing acknowledgement from NOC that one of us is working on that node and a current status of the work done. Is that a pipedream or can this be done already or can this be done with another product?
Node XYZ Down - Down Since - Acknowledged by NOC - Current Status of work being done
Kind of like that...
Any tips and tricks or examples of other ppl's layouts/tools used would be greatly appreciated. Hoefully I posted this to the right place.. just looking for some advice from those want to share in their best practices. Thanks in advance!