2012-04-10 00:41:02 by jdixon
I'd like to begin sharing more of my knowledge as it pertains to using Graphite in production. Most of these upcoming posts are bound to be of the "check out this cool function" variety, but hopefully you can stitch them together into something useful. Before I proceed, I'd like to thank Chris Davis and the team at Orbitz who started this incredible software project and released it to the open-source community. Without your work I'd be stuck using something... less awesome.
Today's tip comes courtesy of a combined effort by me and Michael Leinartas (@mleinart). I've used this particular combination of functions before to calculate the number of "events" in a series during a particular timeframe. Unfortunately I failed to record this query anywhere (pro-tip: save your best Graphite functions in a document or gist, you'll be glad you did) although I had a vague idea of the functions needed. Michael was kind enough to remind me of the particular order for chaining the functions.
At $DAYJOB we use a large number of EC2 instances for various components. For the last few months we've been experiencing a high mortality rate with a particular type of instance, used in a particular component configuration. To support our research in tracking down possible kernel bugs we started submitting "annotation" metrics to the Graphite server. These are generally a one-shot metric that we'll render in Graphite using the drawAsInfinite() function. It allows us to identify a particular moment in time by rendering a vertical line where the metric was recorded (along the X axis). This works very well for visualizing isolated events (server crashes, software deployments, etc).
But what if you want to aggregate these events over time, to gauge the frequency of the series as a whole? For this we can chain the group(), sumSeries() and summarize() functions. The group() function, as its name implies, coalesces dissimilar metrics into a single series. This is useful for passing on to other functions that require a single input series, such as sumSeries(). As you might expect, sumSeries() adds up values in the series. Lastly, this is passed to summarize(), which aggregates the values into "buckets" of a specific time interval.
In case you're wondering, I set Line Mode to Staircase Line and enabled Draw Null as Zero. I also enabled Area Mode (All). These are merely aesthetic preferences, but they help me to discern the daily mortality rate over this period.