Polling Graphite with Nagios

2012-05-31 20:37:00 by jdixon

I'm a big proponent of using Graphite as the source of truth for monitoring systems where polling host and service checks have traditionally been the norm. Realistically, this will take a long and gradual shift in philosophy by the larger IT community. Until then, we can still use Nagios and Graphite in tandem for powering more insightful checks of our application metrics.

There are actually a few different "check_graphte" scripts out there. The first one I saw announced publicly was Pierre-Yves Ritschard's check-graphite project. Shortly afterwards I published my own check_graphite script. Pierre's version is smaller but doesn't appear to automatically invert the thresholds (e.g. if critical is lower than warning). Otherwise you should be fine using either module; the remaining differences are mostly isolated to implementation details and default values. Since this is my blog, I'm going to use my script for this example. ;-)

Read the rest of this story...

A Foolishly Sensible Proposal for Graphite

2012-05-30 22:33:26 by jdixon

Let me get one thing out of the way, I fucking love Graphite. No other piece of software I've used has returned as much getting shit done value for so little personal investment. It's a triumph of function and utility, designed to help users collect metrics, store metrics, and extrapolate from those metrics with as little pain as humanly possible. The criticisms and suggestions I present below are conveyed with the utmost respect for all of Graphite's current and past developers, and in particular, Chris Davis and the original team at Orbitz who built and released it as open source. None of the rest of this post should detract from how rewarding it is to work with this tool.

Read the rest of this story...

Taxes are Orthogonal to Wages

2012-05-17 08:16:24 by jdixon

I've been reading about taxing the rich and high unemployment and how the middle class is dying for as long as I can remember. What no one seems to be talking about is that these problems are orthogonal, not causal. Raising the taxes on billionaires is not going to buoy the middle class' ability to buy a new car. And neither will lowering taxes on the rich have the adverse affect. We're no more likely to see a trickle down effect from the government raising taxes as we are from lowering them.

What we really should be asking is what can we do to motivate business owners to increase employee wages without regulation? I'm not sure there is a good answer for that. I think this is a systemic problem within our upper class, one tied to a sense of privilege and a lack of personal responsibility to community.

I don't have any answers. I just wanted to get these thoughts down and see what others think.

Organizing Your Graphite Metrics

2012-05-09 22:13:10 by jdixon

One of the most common questions I get from Graphite users is how best to name and/or organize metric paths. I don't have an exhaustive list of "best practices" but I'd like to share some basic insights I've accumulated.

Misaligned paths are ok. I used to be tempted to try and keep different paths aligned in order to ease correlation of related targets within a graph. Fortunately there are plenty of helpful aliasing functions (and wildcards) to help tame unruly paths.

Read the rest of this story...

The Story Behind Tasseo

2012-05-07 10:19:32 by jdixon

A little over a week ago I released the Tasseo dashboard. The response I got back was nothing short of astonishing. Tasseo is a Graphite dashboard, one of many to have been released in recent months. That fact alone led me to believe it would fly quietly under the radar. I couldn't have been more wrong; Tasseo (pronounced like Casio) tallied over 200 GitHub watchers in the first weekend, and should pass 300 today.

Tasseo was originally developed as a from-the-ground-up reimplementation of the Pulse dashboard we use at Heroku. Pulse has been a tremendously valuable tool for us; unfortunately, it has some drawbacks that make it a challenge to maintain.

Read the rest of this story...

A Precautionary Tale for Graphite Users

2012-05-02 22:09:36 by jdixon

This morning I was collecting some graphs for one of our weekly status meetings. Asked to find something that represented the state of our Graphite system, I naturally gravitated to my usual standbys, "Carbon_Performance" (top) and "Carbon_Inbound_Bandwidth" (bottom).

1-day1-week

The SysAdmin in me loves these because they highlight resource utilization on the server. While the former details disk I/O and CPU, the latter tracks inbound bandwidth in terms of bits and packets per-second. Although the network graph seems utterly boring (in as much as we've all used these in one form of another, from vendor-supplied dashboards to Cacti installations), it's this one that is actually the more complicated of the two to configure.

Read the rest of this story...