2012-04-13 09:57:32 by jdixon
I remember one day when I was trying to narrow down an application causing high load on an outlier within a fleet of servers. Nagios wasn't suitable for the task, as it only told me which hosts were currently spiking, not which ones have been spiking for a certain window of time. And it certainly couldn't identify a particular host based on a performance visualization.
My Graphite wizard hat went on and I went to work, narrowing down the list of suspects using wildcards and visually inspecting each host's load profile. Within 5 minutes I found my suspect and basked in my glory.
Naturally my brilliance was short-lived.
Indeed, Graphite already provides the superb mostDeviant filter. This function accepts an integer N, averages the entire series, and filters out the top N deviant metrics from the overall average.
Take for example, a one-hour trend from this week:
Apply the filter: