A Foolishly Sensible Proposal for Graphite

2012-05-30 22:33:26 by jdixon

Let me get one thing out of the way, I fucking love Graphite. No other piece of software I've used has returned as much getting shit done value for so little personal investment. It's a triumph of function and utility, designed to help users collect metrics, store metrics, and extrapolate from those metrics with as little pain as humanly possible. The criticisms and suggestions I present below are conveyed with the utmost respect for all of Graphite's current and past developers, and in particular, Chris Davis and the original team at Orbitz who built and released it as open source. None of the rest of this post should detract from how rewarding it is to work with this tool.

That said, Graphite has warts. Virtually all of its many visual components (composer, navigation tree, search, auto-completer, flot, dashboard) feel incomplete at best, and broken at worst. It wouldn't be a stretch to label them The Island of Misfit Toys. One of its most useful features, the javascript CLI, has been outright deprecated. Authentication and authorization are mere afterthoughts, a result of coupling Graphite's user interface to Django. Metric names (paths) can be namespaced, but there is no good way to enforce or encourage good naming practices. Despite all of these issues (and more), I can think of no other trending suite that feels quite so natural in the hands of a skilled practitioner.

If that's the case, then what makes Graphite so useful? First, Graphite has an excellent API for creating server-side charts or outputting raw data for client-side rendering. The composer is nothing more than a GUI interface for constructing a Graphite API-compatible URL. Second, Graphite provides a simple interface for submitting metrics. Open a socket; send your metric name, value and timestamp; print a newline; repeat ad infinitum; close socket. Better yet, use an aggregator like StatsD and have it handle that for you.

In other words, Graphite provides well-defined interfaces for input and output of metric data.

So what can be done to address its shortcomings? Why does it need to change at all?

Addressing the latter, it seems to me that Django was originally implemented as the best-of-breed solution at a time when there truly was nothing better. Times have changed, we now understand the benefits of middleware-based web applications, and we have seen that there is little motivation among developers to improve upon the existing framework. Layering new features on top of the existing code base will only serve to more tightly couple Graphite (and its users) to a user interface that is increasingly unpopular.

Ok, what next?

I believe the lowest hanging fruit is the web service and lack of authentication and authorization in the ingress metrics feed. A lightweight web alternative to the existing Django framework would allow users to easily deploy Graphite as modular components in a variety of datacenter or cloud-based environments. A proper middleware would also provide flexibility to support additional authentication mechanisms for the API service. Decoupling our dependency on Django and Apache would significantly decrease the installation and configuration burden associated with the existing web service.

Addressing the need for authentication and authorization in the metrics feed, it would (unfortunately) seem that Graphite is far enough along the adoption curve that any changes to the Carbon input formats would cause unwanted conversion and upgrade hurdles for the user. Alternately, I propose the introduction of a Carbon proxy that would support pluggable authentication mechanisms, granular authorization, proper metric path namespacing (associated with authenticated users or accounts) and encryption. We've already built and deployed a Carbon proxy at $DAYJOB that could be the foundation for this sort of product.

Generally speaking, I know it's a Really Bad Idea (TM) to propose changes without a pull request in tow. Regardless, I fully intend to take a stab at these issues and invite others to join me. While there are those who would argue that its better to invest our resources in the existing code base, I think that these changes would be a wise long-term investment for keeping Graphite viable over the long run. I would love to hear your feedback and/or welcome your participation.

Comments

at 2012-05-31 08:59:02, Jeff Blaine wrote in to say...

"Authentication and authorization are mere afterthoughts, a result of coupling Graphite's user interface to Django"

Elaborate?

"Times have changed, we now understand the benefits of middleware-based web applications"

What is it that you think Django cannot do? Django has a fully pluggable middleware layer.

"...and we have seen that there is little motivation among developers to improve upon the existing framework."

We have?

I dunno dude, respectfully, this all seems like a detail-less "I don't want to learn Django" strawman to me.

at 2012-05-31 15:12:37, Jason Dixon wrote in to say...

@Jeff - It's not a question of what Django can or cannot do. The point is that it's overkill that has - for the most part - stagnated within the Graphite tree. More importantly, the Graphite UI is a hurdle to adoption for some users. Increasingly, users are consuming Graphite purely through the API and ignoring the UI.

at 2012-05-31 16:04:19, Miah Johnson wrote in to say...

I think "don't want to learn django" is somewhat of a valid argument. For the simple reason that many companies have small operations teams who are already juggling too many epic tasks. I think that anything that we can do to make configuration and management of our infrastructure _easier_ and _concise_ is a beneficial way to spend effort. We shouldn't get too hung up on existing tools or techniques. If management of a tool in your infrastructure requires arcane knowledge then it is a problem.

at 2012-06-12 10:05:11, Mark Carey wrote in to say...

Decoupling software and making it more lightweight are two common strategies to make software easier to maintain. In that respect I think this is a good long term move.

I would suggest that any rewrite efforts remove the dependency on pycairo and any other libs the prevent Jython being used.

at 2012-07-18 11:15:45, Jason Smith wrote in to say...

I don't know about Django, but I consider Apache httpd to be another unwelcome dependency.

Django supports FastCGI. But that's not enough. You make a good point: the software world is changing. IMO, web applications should embed their own http servers. This allows us to easily hook them into our infrastructure, with reverse-proxies, rewrites, or simply directly exposing them to users, case-by-case. (Obviously software shouldn't *depend* on its web server, but it should provide one.)

Since statsd is Node.js and Node.js includes an http server, a nice alternative would be an extremely simple Node.js command that does only one thing: stream HTTP queries to Graphite's FastCGI.

That is exactly what I wrote, to be a Graphite front-end. It's released yet (neither documented nor optimized); but the --help output makes things very clear. Expose a FastCGI service as a web service, and optionally fork into the background.

https://github.com/iriscouch/fastcgi

at 2012-10-16 20:37:04, James Pearson wrote in to say...

Django supports more than FastCGI - it works via WSGI, which is the interface pretty much *every* Python web framework and server conform to. Feel free to use gunicorn, flup, etc. and reverse-proxy through to them just like you would with node.

http://wsgi.readthedocs.org/en/latest/servers.html

at 2013-03-28 06:24:36, Mathieu wrote in to say...

Here is a proposition for using ssl authentication :

http://blog.bearstech.com/2013/03/authenticate-everything-with-ssl.html

Next step, isolating graphite usage with name space, you should only manipulate your datas.

at 2014-10-05 19:18:47, Tom Prince wrote in to say...

There appears to be a non-django implementation of the API now. (https://github.com/brutasse/graphite-api)

Add a comment:

  name

  email

  url

max length 4000 chars