Doing an intensive NetApp & VMware healthcheck for a customer, and I found the “Ghost in the Machine”.
Cute, isn’t he? Well, not so much. If I had not expanded out my view range, I would have never found him!
No monitoring tool is perfect, because no environment fits the “best practices” mold. If they did, there would be no options, overrides, and administrators.
What caused a more than 10000% increase in running load? Did anyone feel this? did anyone complain? Did it cause latency to spike?
Did this increase in aggregate load translate to a CPU spike? Did it come about because of a hot plex or raidgroup?
What do your metrics say about your workloads?
Off I go.