Unhelpful Graphite Tip #10 - Time Shifting

2012-04-25 08:44:44 by jdixon

Let's say you want to compare how a particular metric compares to some point in the past. This is a common practice in troubleshooting and capacity planning. What's the best way to achieve this in Graphite?

I might start off by selecting the past four weeks and visually discern the trends from week to week. Here's a graph showing the last month of AMQP activity. We can see that traffic was oscillating quite a bit over the first week and a half before smoothing out and gradually trending downward.

Read the rest of this story...

Unhelpful Graphite Tip #9 - xFilesFactor

2012-04-19 08:24:20 by jdixon

I love that Graphite can support per-second resolution. We've started to use it more frequently with applications that emit a constant stream of metrics to one of our aggregators. But there are times when an application might send updates less frequently, or when transient failures or network congestion result in lost metrics. In this case it makes sense to adjust your xFilesFactor value.

You may remember my last post that mentioned the whisper-info.py utility. It helps you extract metadata from your whisper files. Take for example, a whisper file for one of our collectd metrics:

$ sudo whisper-info.py /data/whisper/collectd/63694/swap/used.wsp

maxRetention: 31536000
xFilesFactor: 0.5
aggregationMethod: average
fileSize: 534580

Archive 0
retention: 86400
secondsPerPoint: 60
points: 1440
size: 17280
offset: 52

...

Read the rest of this story...

Unhelpful Graphite Tip #8 - Dump your Whisper Metrics

2012-04-18 10:59:38 by jdixon

If you've mucked around with your Whisper storage policies or needed to migrate your data to/from Graphite, there's a good chance you've used some of the bin scripts like whisper-info.py and whisper-fetch.py. Unfortunately there are some drawbacks with whisper-fetch.py, most notably that it only fetches content from the first archive to match the requested time period, and it won't return the original raw data after the rollup policies take effect.

Read the rest of this story...

Unhelpful Graphite Tip #7 - Organizing your Saved Graphs

2012-04-15 19:01:26 by jdixon

If you're logged into Graphite as an authenticated user you have the option of saving graphs, which will appear under the "My Graphs" folder in the navigation tree to the left. There are some limitations (you can't include spaces in the filename) but it's otherwise a useful feature for saving and sharing graphs with others.

Unknown to some users, Graphite's dot-delimited naming schema is not only available in metrics, but in saved graph names as well. Once you've created or modified a graph, click the Save button (floppy disk icon)...

Read the rest of this story...

Graphite Script for Campfire Hubot

2012-04-13 23:42:57 by jdixon

We use Campfire extensively at $DAYJOB. As our Ops team is 100% remote, it's become indispensable for us. Although it has some minor warts (lack of proper timestamps) it works quite well as a chat medium and collaboration tool. Because of its popularity, there are tons of plugins available. Not the least of which is Hubot, a bot written by GitHub specifically for Campfire.

Read the rest of this story...

Unhelpful Graphite Tip #6 - Filtering by Most Deviant

2012-04-13 09:57:32 by jdixon

I remember one day when I was trying to narrow down an application causing high load on an outlier within a fleet of servers. Nagios wasn't suitable for the task, as it only told me which hosts were currently spiking, not which ones have been spiking for a certain window of time. And it certainly couldn't identify a particular host based on a performance visualization.

My Graphite wizard hat went on and I went to work, narrowing down the list of suspects using wildcards and visually inspecting each host's load profile. Within 5 minutes I found my suspect and basked in my glory.

Naturally my brilliance was short-lived.

Read the rest of this story...

Unhelpful Graphite Tip #5 - Solid State Drives

2012-04-12 15:32:40 by jdixon

Artur Bergman (@crucially) kindly recommends:

Editor's Note: Seriously though, you really should move your Whisper files over to SSD if you haven't already. The IO gain is tremendous and allows you to spend your time being more creative with process distribution across CPU cores (hint: future article).

Unhelpful Graphite Tip #4 - Bootstrap the Django DB

2012-04-12 08:17:43 by jdixon

If you're not already aware, Graphite uses Django as the web framework for its underpinnings. In particular, it relies on Django for all user administration, authentication and authorization facilities. This is convenient for Graphite developers, but can be rather inconvenient for Graphite administrators with little-to-no Django experience.

One of my earliest headaches with automating Graphite installations was trying to workaround the interactive manage.py syncdb step from the installation doc. This is usually something everyone wants to run, since it performs the initial admin user creation.

Read the rest of this story...

Unhelpful Graphite Tip #3 - JSON Output

2012-04-11 10:06:13 by jdixon

I love JSON. No really, I fucking love JSON. It might have something to do with its phonetic approximation to my own name. Or it might be my preference for anything that hastens the death of XML. Either way, it's a handy format that's become ubiquitous for data interchange. And fortunately for those of us who prefer our graphs rendered client-side, Graphite supports it as an output format.

Read the rest of this story...

Unhelpful Graphite Tip #2 - Graph Bookmarklet

2012-04-10 18:58:26 by jdixon

I wish I could say I've been using this little gem for years. Alas, I just learned about it last night courtesy of R. Tyler Croy (@agentdero). This has already been a godsend, in less than one full day of use.

Read the rest of this story...

Unhelpful Graphite Tip #1 - Frequency of Events

2012-04-10 00:41:02 by jdixon

I'd like to begin sharing more of my knowledge as it pertains to using Graphite in production. Most of these upcoming posts are bound to be of the "check out this cool function" variety, but hopefully you can stitch them together into something useful. Before I proceed, I'd like to thank Chris Davis and the team at Orbitz who started this incredible software project and released it to the open-source community. Without your work I'd be stuck using something... less awesome.

Today's tip comes courtesy of a combined effort by me and Michael Leinartas (@mleinart). I've used this particular combination of functions before to calculate the number of "events" in a series during a particular timeframe. Unfortunately I failed to record this query anywhere (pro-tip: save your best Graphite functions in a document or gist, you'll be glad you did) although I had a vague idea of the functions needed. Michael was kind enough to remind me of the particular order for chaining the functions.

Read the rest of this story...