2012-06-01 11:40:25 by jdixon
This is one of my most favorite, and certainly most underappreciated graphs. Its simplicity belies its usefulness. This single chart gives me a holistic view of our metrics feed, writes to Whisper files, as well as general system health. At a glance I can correlate slow updates caused by a spike in Whisper file creations or a backup resulting in a higher PPU value. We use some of its targets with Nagios to monitor for metric feed issues. And it's always the first place I look whenever there's a whiff of Graphite problems.
Here is a recent one-hour snapshot:
And its corresponding 24-hour view:
The targets are relatively straightforward. We use the group function so that we can easily sumSeries multiple carbon cache daemons at once. In our installation we actually have eight carbon cache processes (and four relays), so this saves a lot of typing. The Points-per-Update (PPU), CPU and Creates are all rendered on the secondYAxis to keep them at a reasonable scale.
alias(color(sumSeries(group(carbon.agents.*.updateOperations)), "blue"),"Updates") alias(color(sumSeries(group(carbon.agents.*.metricsReceived)), "green"), "Metrics Received") alias(color(sumSeries(group(carbon.agents.*.committedPoints)),"orange"),"Committed Points")) alias(secondYAxis(color(sumSeries(group(carbon.agents.*.pointsPerUpdate)),"yellow")),"PPU") alias(secondYAxis(color(averageSeries(group(carbon.agents.*.cpuUsage)),"red")),"CPU (avg)") alias(secondYAxis(color(sumSeries(group(carbon.agents.*.creates)),"purple")),"Creates")
What's your favorite graph? Is it something you'd be willing to share? Feel free to tweet me a gist with your graph configs and I'll post them on my blog.