We have an issue that occurs about once a week. It only involves Juniper EX3200/EX4200 switches and only some of them we have deployed/monitored.
About once a week, we will get an alert from Solarwinds that one of these devices is down (this ALWAYS occurs in the same order of devices making it even more strange). When we check the device it's actually up, responding to pings etc. BUT it will not respond to pings from the Solarwinds platform itself.
I opened several tickets with Solarwinds and they have told me that they just rely on the ping utility within Windows 2008 so if the device isn't pingable then it's got to be a Windows 2008 problem. I have opened tickets with Juniper and we have proven that the ping request is arriving at the EX switch and that the response is going back out. A Wireshark on the Windows server shows the ping going out but never coming back.
We have the latest Windows patches and the server is considered "clean" from a software perspective (and viruses of course). There is nothing in between on the network that can cause this issue so we focused on the Windows server.
Bit the bullet as per Solarwinds suggestion and blew away the Solarwinds server that does the polling. Rebuilt it, restored everything and same problem. Server now also has the latest network card drivers.
Have totally run out of ideas - frustrating for sure and no idea what to do next. Solarwinds said they are bringing out a feature where ping of the remote device isn't required (thank goodness, we were shocked when that was a requirement during our initial installation). This new "feature" would at least alleviate this issue.
It's also worth noting that while the pings fail, SNMP continues to poll data successfully. It's also worth noting that the alert will trigger every time on the same initial device and follow the same pattern through other Juniper EX switches at the same time interval.
Thoughts? ;)