Despite the relatively maturity of monitoring and systems management as a discrete IT discipline, I am asked - year after year and job after job - to give an overview of what monitoring is.
This document was my attempt to address that question in a more structured form.
Originally intended as guide to help bring new team members (often fresh out of college or a technical program) up to speed with monitoring concepts quickly, this document (or portions of it) can serve as a good introduction for a variety of audiences.
Excerpt:
"If you have worked in the IT field for more than 15 minutes, the situation described above is neither unique nor rare, even if it is somewhat colorful. Systems crash unexpectedly, users make bizarre claims about how “the internet is slow”, and managers ask for historical statistics that leave you scratching your head wondering how to collect in a way that is meaningful and doesn’t consign you to the hell of hitting “refresh” and writing down numbers on a paper for half a day, just to get a baseline for a report.
The answer to all these challenges lies in effectively monitoring your environment – collecting statistics and/or checking for error conditions so that you can act or report effectively when needed."