The Grand Windows Laptop Experiment

2016-12-07 19:28:09 by jdixon

If you follow me on Twitter, you might remember my rants (well, among many other people) regarding their new 2016 MacBook Pro line of laptops. There've been an abundance of reviews online, criticizing Apple for their "courage" to remove ports and functional keys that are still a mainstay in most users' workflows, and for actual performance regressions in most real-world scenarios. I think these changes reflect a desire by Apple to cater to their larger mass consumer audience, while at the same time streamlining a Mac product line facing an eroding market due to our increasingly mobile-first world.

Read the rest of this story...

What's up with Monitorama EU

2016-10-26 23:31:31 by jdixon

Way back in 2013, I put on a small event in Boston, Massachusetts focused on monitoring software and related themes. The unexpected popularity of the show led me to immediately turn around and announce a second event to take place later that year in Berlin, Germany. Both of the conferences were hugely successful and the subject matter clearly resonated with the larger DevOps and Engineering communities.

Read the rest of this story...

Benchmarking Graphite on NVMe

2016-09-13 13:56:30 by jdixon

Here's another quick update to demonstrate what's possible with a single Graphite node running master (these Carbon and Graphite-Web commits, specifically). As you'll see in the results below, this configuration was able to achieve 300k datapoints per second.

This test was performed on a Packet type 3 server with the pair of NVMe flash drives striped in a single LVM volume. Installation of the Graphite stack was still performed using Synthesize v.2.4.1. To take advantage of the increased I/O capacity I added more cache processes for a grand total of eight (8) relays and sixteen (16) caches. Five instances of Haggar ran concurrently, on a separate Packet type 1 server in the same Parsippany, NJ datacenter.

Read the rest of this story...

Benchmarking Graphite master on AWS

2016-09-12 12:15:25 by jdixon

Hello, friends. Just wanted to follow up the previous post with a quick update. As I've mentioned publicly, developing and cutting a new release from Graphite's master branch has become a personal and professional priority for me. And while I've become very familiar with much of the code base over the course of writing The Graphite Book and have thrown a lot of traffic at it over the last few months, I hadn't run any significant tests for performance regressions at scale (compared to the 0.9.15 release).

This round of tests used the same configuration and benchmarking processes as before. I neglected to mention this before, but all series of benchmarks started with installing Graphite using the Synthesize setup script. For the previous test I used Synthesize v.2.4.1 to install Graphite 0.9.15 on a 64-bit Ubuntu 14.04 LTS instance in Amazon's EC2 cloud. For this round I went with Synthesize v3.0.0RC2, which targets Graphite's master branch.

Read the rest of this story...

Benchmarking Carbon and Whisper 0.9.15 on AWS

2016-08-25 15:40:17 by jdixon

This is just a quick post to share some recent benchmarking results for a single Graphite 0.9.15 server. The host is a single EBS-optimized EC2 i2.4xlarge instance with a 400 GiB EBS Provisioned IOPS SSD (io1) with a requested 20k Max IOPS.

I'm not going to dive in too deep with the results, but I'll point out that with the following configuration we were able to increase batch writes effectively, resulting in a peak 38 points per update (pointsPerUpdate, averaged across all cache processes). This means that on average, caches were able to flush 38 datapoints from memory to disk with every write request.

Read the rest of this story...

Everybody Loves Graphite

2015-11-05 23:15:04 by jdixon

There was an article published recently - not here, and not to be linked or referenced here directly - proposing that "nobody loves Graphite" anymore. A linkbait title if I've ever heard one. Many folks linked this article to me, almost certainly expecting me to respond in an uproar. And yet, what I find myself really disappointed over is the obvious misrepresentation of fact (as it pertains to Graphite's technical limits) and an almost malicious disregard for the enormous community that uses it and contributes back to its ongoing development.

It's almost as if they're trying to sell you something. Nah, that couldn't be it.

I readily concede that Graphite was not designed for the transient nature of the sort of bleeding-edge containerized, clustering systems that are becoming popular in conference talks and Hacker News (if not in actual use in production, but we'll forgive them this tiny oversight). Admittedly, it takes an expertly skilled engineer to craft a background job for the purpose of removing old cruft. It's not every SysAdmin that knows how to cron, after all.

Read the rest of this story...

Graphite 0.9.14 - the Highlights

2015-11-01 11:11:12 by jdixon

As I mentioned in the previous blog post, we're perilously close to shipping the next Graphite release. Although we typically avoid large, sweeping changes in the stable branch, the long development cycle leading up to this particular release means we have a number of big new features and performance improvements to announce.

Rather than assume everyone will read and understand the significance of all the changes in the 0.9.14 Release Notes, I felt it would be a good idea to touch on some of them here. This collection represents a small handful of highlights among the numerous changes from this release. The sustained level of interest and contributions from the community continue to astound me. I can't thank everyone enough for their continued support of the Graphite project.

Read the rest of this story...

Graphite 0.9.14 - the Phoenix Release

2015-10-27 23:55:39 by jdixon

If you're like most Graphite users, you're probably wondering if and when there will ever be another release for the project. There hasn't been much public activity over the last couple of years, at least outside of GitHub. A lack of corporate sponsorship, in terms of dedicated developer and maintainer hours, means that the project receives attention as volunteers' schedules permit. Speaking solely for myself, I prioritize Graphite development somewhere behind family, work, the Monitorama conference, writing the Graphite book, and "other recreational activities".

Despite the lack of a regular release cycle, Graphite is as popular as ever. The Grafana project is going gangbusters, with Graphite as its priority time-series backend. A variety of new open source projects have cropped up offering high-performance alternatives to the original specification implementations (graphite-web and carbon). New software projects, both commercial and open source, continue to target Graphite API compatibility because of its ubiquity and ease of use. Heck, even those other competing time-series engines are forced to support Graphite-friendly interfaces. In some cases they even outperform their own proprietary ingress methods.

Read the rest of this story...

On Writing the Graphite Book

2015-02-22 23:56:56 by jdixon

I'm writing a book. This may come as a surprise given the lack of content on this blog over the last... entire year of 2014. Nevertheless, I'm pleased to report that the rumors are true and I am in fact writing a book about Graphite.

Three different editors at O'Reilly contacted me over the course of a few years about the possibility of authoring a volume about my favorite Open Source time-series rendering engine. I had significant concerns about the availability of free time I'd have to spend on this project, so I had to turn them down the first couple times. Last year, something finally clicked and I relented. And so, Monitoring with Graphite became a thing.

We've decided to release it as a work in progress, with the Early Release going on sale in December 2014 and an expected official release around June 2015. According to the outline we're almost at the halfway point of the book, so I think it's reasonable to say we're still on schedule.

If you've enjoyed my blog posts, I really think you'll love the book. I've included a healthy discussion around monitoring concepts and the "composable monitoring system", a deep dive into the Graphite components, fully fleshed-out installation processes and tips of the trade, and a helluva lot more. I aim to be as comprehensive as possible while still managing to keep it an entertaining read. Frankly, this is probably the only subject matter that I'll know well enough to write a book about, so I'm not about to let myself screw it up.

I encourage you to grab the Early Release Ebook and provide feedback. Your comments and suggestions (or questions) will continue to fuel the content for the rest of the book. And if you make it out to Monitorama this summer, I'll be happy to sign your tablet or laptop.

Graphite Tip - A Better Way to Store Events

2014-01-05 20:54:03 by jdixon

Graphite is well known for storing simple key/value metrics using the Whisper time-series database on-disk format. What is not well known about Graphite is that it also ships with a feature known as Events that supports a richer form of metrics storage suitable for, well, events. Imagine a place where you could store tagged metrics and additional data relevant to the event (e.g. code snippets, comments, etc). Many folks use NoSQL databases such as HBase for this purpose, and that's a perfectly reasonable approach. However, if you'd like to store these somewhere where they can be correlated with the rest of your Graphite metrics, then Events might be a good fit for you.

Read the rest of this story...

Graphite Tip - Django 1.4 Admin Workaround

2014-01-05 00:39:38 by jdixon

If you're using Graphite with Django 1.4 or newer, you've probably noticed the broken styling on the Admin module. This appears to be an annoyance at worst, but it's ugly nonetheless. I don't have a fix for this yet, but I have a workaround for anyone using Apache with their Graphite web UI.

Read the rest of this story...

Graphite Tip - Working with Carbonate

2014-01-02 20:29:59 by jdixon

One of my friends at GitHub, Scott Sanders, recently published a new suite of tools collectively known as Carbonate. Anyone who has had the "pleasure" of migrating Graphite to one or more new servers, in production, has likely felt the pain of dealing with gaps in your time-series data. This is a common source of pain for many administrators; I'm really pleased that Scott was able to put together this collection of shell primitives for managing Whisper migrations.

Read the rest of this story...

Migrating Graphite from SQLite to PostgreSQL

2013-12-14 19:29:20 by jdixon

As mentioned in my previous article, I no longer recommend using SQLite as a Graphite backend for anything outside of development or testing work. It is too lenient with data types, and doesn't provide the levels of concurrency I'd like to see in an RDBMS for a production web service.

This opinion was cultivated almost exclusively from my recent experience migrating a single-node Graphite instance with an SQLite database to an HA pair of Graphite nodes with a shared PostgreSQL backend. For those of you considering migrating off SQLite to PostgreSQL, this article documents my initial struggles and eventual fixes for this transition.

Read the rest of this story...

Why You Shouldn't use SQLite with Graphite

2013-12-10 19:44:07 by jdixon

If you've ever had the pleasure of installing Graphite, you're almost certainly aware that it uses Django as it's web framework. In order to support features like saving graphs and dashboards, Graphite needs somewhere to store the data that describes these objects. As you might expect, a relational database with support for SQL is a dandy place for this sort of relational data. Django supports a number of RDBMS backends using the Django ORM, making it relatively painless to get started with Graphite in a development or test environment using the popular SQLite database engine.

Read the rest of this story...

My Impressions of InfluxDB

2013-11-11 12:15:38 by jdixon

I mentioned last week that I was planning to look closer at InfluxDB this past weekend, and some folks asked me to do a writeup on my findings.

InfluxDB is a time-series metrics and events database based on the LevelDB key-value store. LevelDB was written and open sourced by Google, and is an optional backend for Riak. InfluxDB (or "Influx", for short) inherits many of LevelDB's default characteristics, which means it's optimized for writes and uses compression by default, but it can be slow for reads and deletes.

Read the rest of this story...

What's Up with Playfair?

2013-10-19 02:58:37 by jdixon

Within the last hour I stumbled across a tweet from Dan Ryan mentioning a new hosted Graphite + StatsD service called Playfair. As you might expect, this piqued my interest.

Immediately, I thought of Hosted Graphite and wondered how this compares with their offering. Would it have its own dashboard? Was it a DigitalOcean-backed Graphite instance (admittedly, something I've considered trying to package up myself)? I hopped over to their website and looked around.

Read the rest of this story...

Graphite Tip - Mixing Lines and Stacks

2013-08-16 13:43:15 by jdixon

One of Graphite's shortcomings is that it's not easy to construct a composite chart of both lines and area sections. In fact, it's not possible at all unless you're willing to stack your areas. But if you are dealing with data where it makes sense to stack them, and you want to correlate that with something else as a line series, here's an example demonstrating how you can do it.

Read the rest of this story...

Are We Ready to Kill Thresholds?

2013-06-26 09:12:54 by jdixon

I've been hearing a lot of chatter from various sources that adaptive fault detection is going to be The New Shit ™ and that static thresholds are virtually useless because they lack context. While I agree that some of the more advanced techniques sound amazing (and make no mistake, I'm really excited about the possibilities here), it's foolish to think that thresholds as a measure of fault conditions are useless.

Read the rest of this story...

Dusk - Yet Another Graphite Dashboard

2013-06-21 19:36:28 by jdixon

Not too long ago we were looking for a way to visualize a group of metrics across our entire fleet. Although you could render all of the metrics on one graph, it becomes nearly impossible to distinguish one from another. Jesse Newland (@jnewland) suggested that we look at Cubism.js' horizon charts. The nice thing about horizon charts is that you can cram a lot of information into a small vertical space, due to the way they render "overlapping" values with increasing intensity. One thing led to another, and soon Dusk was born.

Read the rest of this story...

Call for OSS Project Instrumentation

2013-05-28 09:56:28 by jdixon

As an open source developer, some portion of my time is spent not just coding and responding to user feedback, but to act as a Project-slash-Product Manager. I have to determine which bugs to prioritize, which features are necessary, and where to allocate my finite resources. Much of this is driven by what interests me at the time and which features will best fit into my overarching vision for the project.

Read the rest of this story...

Graphite Tip - Counting Number of Metrics Reported

2013-05-27 13:59:50 by jdixon

There's been many a time when I've asked the question "I wonder how many hosts are sending this metric?" Unfortunately there's no built-in Graphite function for determining the number of hosts submitting a particular metric (or tree of metrics). But this morning I stumbled across a brilliant hack of a Graphite query by Jesse Newland (@jnewland) for rendering this value.

Read the rest of this story...

WTF is Chartroulette

2013-05-13 16:32:42 by jdixon

Sometimes the silliest features are the ones that inspire you most. This was certainly the case with the new Chartroulette view that I recently merged into Descartes. Because I wanted so badly for this to become a reality it forced me to knock out some other dependencies (user model, favorite dashboards, and better user mapping) rather quickly.

To be fair, there's nothing silly about the idea behind Chartroulette. At GitHub we have an internal app by @maddox that allows users to rotate any Mac or iOS-based device's screen through a series of website URLs. Typically we use this to cycle through dashboards or graphs. While I'd love to see this open-sourced, I know that Jon is a very busy guy so I figured that emulating this functionality within Descartes might be the next best thing.

Read the rest of this story...

Feeding Params into Descartes

2013-05-06 13:07:32 by jdixon

This is a relatively minor enhancement in terms of LoC but it would take too many words to describe on Twitter so here we are. Recent commits added support for passing interval and columns parameters into Descartes views (graphs, dashboards, etc). Previously you would always get the default layout whenever loading any Descartes page.

Read the rest of this story...

Graphite Tip - Group by Node

2013-04-14 18:23:29 by jdixon

In the process of setting up some graphs for Status Board, I thought it would be nice to render my GitHub activity (in terms of commits). As I demonstrated in a post last year, you can fire off a metric to Graphite using GitHub's post-commit webhook feature. Rendered with drawAsInfinite, this is nice for getting a rough visualization of your commit activity, but doesn't provide total counts. Alternatively, you could use group with summarize to get totals per interval, but you wouldn't be able to view per-repository numbers.

Read the rest of this story...

My Thoughts on FOSS Crowdfunding

2013-04-07 23:04:24 by jdixon

I was recently cold-emailed about a new service, Catincan, that offers a Kickstarter-like bounty system for open source software projects or features. Someone representing the company reached out to me (and presumably, many others in the open source community) for feedback, generally about the funding of open source development, and specifically about their own service.

Read the rest of this story...

Contribute to Open Source Monitoring Projects

2013-04-03 16:40:34 by jdixon

You say you want to contribute to an Open Source project, but you're not sure where to start? Have an interest in monitoring, trending or logging software? Hop on over to the Monitorama Hackathon issues list and take a look around. In the weeks leading up to the event we seeded the repo with a bunch of tasks/feature requests/bug reports that are easily digestible over the course of a day or two. These are a great place to get started on a new project, or nail out some quick issues that have an immediate impact.

Read the rest of this story...

Graphite Tip - Grouping Release Metrics in the Legend

2013-04-03 12:48:07 by jdixon

Last year I gave examples for using drawAsInfinite to help visualize the frequency of particular events (deploys, commits, etc). One of the side effects that I failed to mention is that these will quickly fill up your legend with labels, making it impossible to view the legend at all. It's likely that you've seen this sort of thing at least once (assuming you forced hideLegend=false):

Read the rest of this story...

Graphite Tip - Converting Zeroes into Nulls

2013-04-01 17:43:26 by jdixon

I was looking at some internal data for @jnunemaker and @jfryman today when I stumbled across the Data Filters group of functions inside the Graphite composer. These functions are handy for those times when you want to exclude a subset of a particular series of data, for whatever reason. In our case, we were looking at some metrics where we had spikes of data that were interesting, and a lot of uninteresting data reported as zeroes.

Read the rest of this story...

Thoughts from Monitorama 2013

2013-03-30 10:34:42 by jdixon

This is not your typical conference review. This is a braindump of my thoughts following the organization and execution of the 2013 Monitorama Conference and Hackathon in Boston (Cambridge), Massachusetts.

Read the rest of this story...

Monitorama Hackathon

2013-03-05 10:38:23 by jdixon

One of the overarching themes that drove me to organize Monitorama was the desire to bring together OSS developers in an effort to improve the current state of monitoring and trending software. I grew impatient with the lack of measurable progress that happens at the typical SysAdmin/WebOps/DevOps-style events, which tend to focus on automation and traditional operations fare. While I'm pleased that everyone is excited about our speaker lineup, the works we accomplish at this Hackathon will be the true barometer of our success. With this in mind I have some points to consider as you prepare for your attendance and participation at the event.

Read the rest of this story...

Adding a Metrics Cache to Descartes

2012-11-08 00:00:26 by jdixon

Update / TL;DR: Thanks to Bernd Ahlers (@berndahlers) for clueing me into the fact that you can call rufus-scheduler directly rather than indirectly through resque-scheduler. Because it uses Event-Machine, there's no need to run separate worker processes or queue up the jobs. Consider me sold. The changes have already been committed.

If you still want to read the original post, continue on.


Today I merged in a refactor of the Descartes bits that deal with metrics. Specifically, the live Metrics tab and sparklines view. This will have a profound effect on performance, but can also have a surprising effect on your wallet if you're not paying attention.

So, a little background on how Descartes used to operate and why this change was necessary. Not too long ago I added a new Metrics page that displays sparklines for every metric in your Graphite server and lets you click on them to create a composite graph. Although the page is still rather immature, it's useful for basic visualization and graph creation. Personally I think its major selling point right now is in the sparklines I mentioned. This is one thing that you don't really get with native Graphite -- being able to quickly see activity patterns on any metrics without going through the hassle of actually creating a graph. This is made that much more awesome by the presence of live filtering. Click on the Add to Graph button and you're presented with an additional input field that, as you type a string, will filter down the list of metric sparklines you're viewing in realtime.

Read the rest of this story...

A Simple HTTP POST Server in Node.js

2012-10-16 21:48:16 by jdixon

In the process of hacking on a plugin for the Uptime project I realized that I needed a simple HTTP server capable of receiving and dumping JSON data via POST. I'm well aware of the awesome Python SimpleHTTPServer module, but alas it doesn't support POST requests.

Fortunately I was able to throw together a quick little server using the sample HTTP server on the Node.js project website along with their API docs. I know this is a ridiculously simple daemon, but someone else might find it as useful as I did.

Read the rest of this story...

Assembling Uptime, Umpire and Graphite

2012-10-16 11:05:08 by jdixon

Just this morning I discovered the Uptime project over on GitHub. The author bills it as "A simple HTTP remote monitoring utility using Node.js and MongoDB". I'm already in love with this tool thanks to its composability and ease of use.

The documentation over at the Uptime project is quite good, so I won't bore you with the details. The basic gist is that you'll want to have a MongoDB server available (OS X users can just brew install mongodb) and Node.js (at least version 0.8). Clone the repo locally and then run node app.js to start the monitor (web UI) and analyzer (check engine).

Read the rest of this story...

A Candid Word about Monitorama

2012-10-12 10:22:31 by jdixon

Registration for Monitorama opens up one week from today. We're in the highly unusual situation of being a first-year conference that will very likely sell out in almost as short a time as it took to plan it. However, while I'm thrilled that so many incredible people want to attend, I want to take a moment to make sure everyone fully understands what this event is truly about, and what I personally expect to come out of it.

Monitorama will be an inclusive conference. There will be no discrimination according to race, gender, sexual preference, programming language, operating system or editor. You will not be judged on your experience, your abilities as a programmer or the number of followers on GitHub. The only tacit requirement will be a passion for our shared open-source monitoring toolset and the tenacity to dig in, have fun and help advance the state of our craft over the course of this two-day event.

Many of you will write code. Some of you will work on documentation. Others will speak or present workshops to inspire the other participants and help bring focus to our mission.

Everyone who registers should do so with the understanding that they are expected to participate. Attendees are for other, lesser, conferences. Monitorama is all about getting shit done and having fun doing it. Do not let this scare you. We will all walk away from Boston knowing that there is great work yet to be done, but with the collective wisdom and progress gained from an intense program of collaboration and learning.

There are only 200 seats available for Monitorama 2013. I hope to see your name on the ledger, one week from today.

Stray Bits from my DevOpsDays Roma Talk

2012-10-08 09:15:01 by jdixon

A few stray bits of information following my presentation at DevOpsDays Roma.

The talk has received a ton of positive feedback from everyone. The slides in particular have been getting a ton of redistribution on Twitter. I'm not sure if this is a sign that my deck is that much better than the actual talk, but whatever. I'm glad that people are finding it useful and/or informative.

Read the rest of this story...

Trip to Italy

2012-10-06 13:21:15 by jdixon

I've just concluded a week in Italy as part of my visit to speak at DevOpsDays Roma. Most people don't know this, but I was an Architecture student at Georgia Tech many years ago. As such, I was exposed to a lot of Greek and Roman history. This made a lasting impression on me; I've always dreamed of visiting Rome and it was a stroke of luck when I heard about the conference and was eventually accepted to speak.

Read the rest of this story...

Surge 2012 Postmortem

2012-09-28 18:35:11 by jdixon

The curtain has lowered on another couple days of scalability lessons and "disaster porn" at this year's Surge conference. Despite my initial misgivings that the registration fees were too high, the conference organizers have once again put together an experience that is quite possibly the best among all technically-oriented events.

Read the rest of this story...

#monitoringsucks BoF at Surge 2012

2012-09-27 14:13:16 by jdixon

Kicking off this year's Surge conference was a pair of BoF sessions. The #monitoringsucks one was packed, to the extent that a number of us had to steal chairs from the Chef BoF across the hall. I remembered to write down some of the highlights from the session. Note that I'm not quoting anyone directly and am summarizing each speaker to the best of my recollection. If you were at the event and remember things differently, please notify me in the comments section below.

Read the rest of this story...

Screencast - Installing Graphite from Source

2012-08-28 12:51:54 by jdixon

A couple weeks ago I uploaded a new screencast and tweeted about it, but I completely forgot to mention it here. This is a fairly thorough demonstration for installing Graphite from git checkouts on an Ubuntu 10.04 server. Please let me know if you have any questions about the content or ideas for future screencasts.

I would recommend watching it in fullscreen in at least 720p resolution. All commands and configurations referenced in the video can be found here.

Trending your PagerDuty Alerts in Graphite

2012-08-28 12:04:09 by jdixon

We've noticed an increase in alerts recently at $DAYJOB. So naturally we thought it would be helpful to begin tracking Nagios alerts in Graphite. Alas, this will only help us going forward, so I wondered how difficult it would be to retrieve historic data from PagerDuty and import it into Graphite. Turns out it isn't too hard, although we have to work around some of the limitations in PagerDuty's Incidents API.

Read the rest of this story...

My Personal Roadmap

2012-08-26 18:21:47 by jdixon

I've been a little busy lately and haven't found the time to post any new articles, Graphite-related or otherwise. For those who missed the announcement, I started working at GitHub in July. Initially I continued my work on Descartes; more recently my time has been split up among a few different projects, both inside and outside of work. Although I generally detest announcing plans before shipping them, I thought others might like to read about what I'm working on these days.

Read the rest of this story...

Trending your GitHub Commits in Graphite

2012-07-22 22:45:55 by jdixon

Today I was browsing the list of service hooks that GitHub provides. I almost forgot that there's a simple WebHook service that POSTs commit information during the git post-receive hook to any external URL. This got me thinking that it would be nice to trend commit activity inside my Graphite server. Don't get me wrong... GitHub already provides some really nice visualization for project and committer activity on their site. However, as a data junkie, I'd love to be able to correlate this activity with my own application metrics.

This was a perfect fit for Backstop, the HTTP/JSON-to-Graphite bridge. After a couple hours of futzing around I had a working version. If you haven't used Backstop before, rest assured that getting started is pretty darn easy. In fact, if you're a Heroku customer, it's easy and free. There are just a few commands to get your own Backstop server running on Heroku.

Read the rest of this story...

Introducing Descartes

2012-07-10 17:51:45 by jdixon

Graphite is renowned for its usefulness and ease for prototyping new charts. It's also known for having a dashboard component that leaves much to be desired. In response the community has seen a rising tide of new dashboard projects aimed at filling this gap. The growing list of third-party Graphite dashboard projects is extensive, but continues to fall short in areas such as self-service, configuration, and collaboration.

Most of this software require users to generate dashboards from JSON or other command-line gymnastics. While this is reasonable for many operations folk, it's an impedance for the engineers and business-oriented users; the same users that we want using this software for making sound decisions. Graph views are static and inflexible for collaboration and historical dialogues. In response to these shortcomings I've started the Descartes project.

Read the rest of this story...

Collection of D3 Tutorials

2012-07-09 13:24:05 by jdixon

A friend of mine recently asked for some good D3 tutorials and sites. At second glance these are an awesome collection of examples for using D3 and general visualization work.

Pro: You don't have to scour the web for these yourself.

Con: It's unlikely you'll ever fully consume all the awesome.

The State of Employment

2012-07-08 17:39:48 by jdixon

Seems that it's common for folks to blog about changes in employment. I hate to be left out on the fun, so I'll take a brief moment to officially announce my pending "new-hire" status with GitHub, effective tomorrow.

Friends who've already heard the news pepper their congratulations with a sense of confusion as to why I'd leave a good thing at Heroku. Indeed, I think most people in our industry would rank Heroku and GitHub at the top of their list of prospective employers. Unsurprisingly, I loved my job. I've never worked with a team of engineers as highly skilled or dedicated to their mission as the men and women at Heroku. So why would I leave?

Read the rest of this story...

El Cheapo Network Graph

2012-07-08 12:45:58 by jdixon

Here's an embarrassingly simple script I threw together this morning to track network latency to a handful of remote websites/networks from my home internet. Yes, I understand that these numbers are highly influenced by my proximity to various CDN networks and bear no resemblance to how actual web browsing would perform concurrently. That isn't the point. This is merely to demonstrate a cheap and easy way to get more metrics into Graphite; and at the same time, providing me with some useful reference for when my home internet provider will inevitably have hiccups.

Read the rest of this story...

Graph Porn and Sharing

2012-07-01 13:43:14 by jdixon

Part of what I see myself doing (by writing blog posts, creating software like Tasseo, etc) is to try and help others learn better ways of communicating our operational knowledge through visualization tools and methodologies. While I've gotten a lot of positive feedback from my Graphite articles, what I haven't seen as much is a two-way sharing of the harvested data made possible through these experiences.

I think there are a couple possible reasons for this: first, we work with "propietary" data that our employers might not want divulged; second, we assume our data is immaterial and not worth sharing. For the former, I think this is a very similar argument that many of us had with employers during the push to open source software. There is much to be gained by sharing our raw data (perhaps without all of the proprietary metadata and labels that make it relevant to our business) and seeing those examples improved upon and returned by our peers.

Read the rest of this story...

Velocity 2012 Postmortem

2012-06-29 16:23:23 by jdixon

This week I traveled out west for my first Velocity conference as an attendee. I went out two years ago but I was so busy juggling exhibitor duties that I didn't get to enjoy any hallway networking or formal session. This year I went in with plans to catch as many sessions as possible, particularly those skewed towards monitoring, trending and operations workflow. As expected, I skipped quite a few talks but made up for it with a lot of quality time catching up with peers and reviewing new technologies (and philosophies) in the DevOps space.

Read the rest of this story...

Why Big Monitoring Software Sucks

2012-06-20 17:21:47 by jdixon

There are a ton of open-source and commercial monitoring tools available, so why do we claim that monitoring sucks? Certainly there are some usable tools out there; without them our systems would be even more unpredictable and unreliable than they already are. So what makes one tool sticky where others get tried and tossed aside?

Systems Administrators (and Engineers) are a finicky bunch. We prefer to build complex systems from small, sharp instruments rather than fight with larger, malleable (read: monolithic) software. There's a reason why Pingdom and Pager Duty are enormously popular among technically agile businesses. Cost is only a small part of the equation; these customers understand (implicitly, if not explicitly) that combining these small, sharp tools into a series of logically connected functions (fault detection, notifications and historical trending) is much easier than breaking apart an Enterprise-Ready monitoring suite and coercing it to meet their unique needs.

Read the rest of this story...

Watching the Carbon Feed

2012-06-01 11:40:25 by jdixon

This is one of my most favorite, and certainly most underappreciated graphs. Its simplicity belies its usefulness. This single chart gives me a holistic view of our metrics feed, writes to Whisper files, as well as general system health. At a glance I can correlate slow updates caused by a spike in Whisper file creations or a backup resulting in a higher PPU value. We use some of its targets with Nagios to monitor for metric feed issues. And it's always the first place I look whenever there's a whiff of Graphite problems.

Read the rest of this story...

Polling Graphite with Nagios

2012-05-31 20:37:00 by jdixon

I'm a big proponent of using Graphite as the source of truth for monitoring systems where polling host and service checks have traditionally been the norm. Realistically, this will take a long and gradual shift in philosophy by the larger IT community. Until then, we can still use Nagios and Graphite in tandem for powering more insightful checks of our application metrics.

There are actually a few different "check_graphte" scripts out there. The first one I saw announced publicly was Pierre-Yves Ritschard's check-graphite project. Shortly afterwards I published my own check_graphite script. Pierre's version is smaller but doesn't appear to automatically invert the thresholds (e.g. if critical is lower than warning). Otherwise you should be fine using either module; the remaining differences are mostly isolated to implementation details and default values. Since this is my blog, I'm going to use my script for this example. ;-)

Read the rest of this story...

A Foolishly Sensible Proposal for Graphite

2012-05-30 22:33:26 by jdixon

Let me get one thing out of the way, I fucking love Graphite. No other piece of software I've used has returned as much getting shit done value for so little personal investment. It's a triumph of function and utility, designed to help users collect metrics, store metrics, and extrapolate from those metrics with as little pain as humanly possible. The criticisms and suggestions I present below are conveyed with the utmost respect for all of Graphite's current and past developers, and in particular, Chris Davis and the original team at Orbitz who built and released it as open source. None of the rest of this post should detract from how rewarding it is to work with this tool.

Read the rest of this story...

Taxes are Orthogonal to Wages

2012-05-17 08:16:24 by jdixon

I've been reading about taxing the rich and high unemployment and how the middle class is dying for as long as I can remember. What no one seems to be talking about is that these problems are orthogonal, not causal. Raising the taxes on billionaires is not going to buoy the middle class' ability to buy a new car. And neither will lowering taxes on the rich have the adverse affect. We're no more likely to see a trickle down effect from the government raising taxes as we are from lowering them.

What we really should be asking is what can we do to motivate business owners to increase employee wages without regulation? I'm not sure there is a good answer for that. I think this is a systemic problem within our upper class, one tied to a sense of privilege and a lack of personal responsibility to community.

I don't have any answers. I just wanted to get these thoughts down and see what others think.

Organizing Your Graphite Metrics

2012-05-09 22:13:10 by jdixon

One of the most common questions I get from Graphite users is how best to name and/or organize metric paths. I don't have an exhaustive list of "best practices" but I'd like to share some basic insights I've accumulated.

Misaligned paths are ok. I used to be tempted to try and keep different paths aligned in order to ease correlation of related targets within a graph. Fortunately there are plenty of helpful aliasing functions (and wildcards) to help tame unruly paths.

Read the rest of this story...

The Story Behind Tasseo

2012-05-07 10:19:32 by jdixon

A little over a week ago I released the Tasseo dashboard. The response I got back was nothing short of astonishing. Tasseo is a Graphite dashboard, one of many to have been released in recent months. That fact alone led me to believe it would fly quietly under the radar. I couldn't have been more wrong; Tasseo (pronounced like Casio) tallied over 200 GitHub watchers in the first weekend, and should pass 300 today.

Tasseo was originally developed as a from-the-ground-up reimplementation of the Pulse dashboard we use at Heroku. Pulse has been a tremendously valuable tool for us; unfortunately, it has some drawbacks that make it a challenge to maintain.

Read the rest of this story...

A Precautionary Tale for Graphite Users

2012-05-02 22:09:36 by jdixon

This morning I was collecting some graphs for one of our weekly status meetings. Asked to find something that represented the state of our Graphite system, I naturally gravitated to my usual standbys, "Carbon_Performance" (top) and "Carbon_Inbound_Bandwidth" (bottom).

1-day1-week

The SysAdmin in me loves these because they highlight resource utilization on the server. While the former details disk I/O and CPU, the latter tracks inbound bandwidth in terms of bits and packets per-second. Although the network graph seems utterly boring (in as much as we've all used these in one form of another, from vendor-supplied dashboards to Cacti installations), it's this one that is actually the more complicated of the two to configure.

Read the rest of this story...

Unhelpful Graphite Tip #10 - Time Shifting

2012-04-25 08:44:44 by jdixon

Let's say you want to compare how a particular metric compares to some point in the past. This is a common practice in troubleshooting and capacity planning. What's the best way to achieve this in Graphite?

I might start off by selecting the past four weeks and visually discern the trends from week to week. Here's a graph showing the last month of AMQP activity. We can see that traffic was oscillating quite a bit over the first week and a half before smoothing out and gradually trending downward.

Read the rest of this story...

Unhelpful Graphite Tip #9 - xFilesFactor

2012-04-19 08:24:20 by jdixon

I love that Graphite can support per-second resolution. We've started to use it more frequently with applications that emit a constant stream of metrics to one of our aggregators. But there are times when an application might send updates less frequently, or when transient failures or network congestion result in lost metrics. In this case it makes sense to adjust your xFilesFactor value.

You may remember my last post that mentioned the whisper-info.py utility. It helps you extract metadata from your whisper files. Take for example, a whisper file for one of our collectd metrics:

$ sudo whisper-info.py /data/whisper/collectd/63694/swap/used.wsp

maxRetention: 31536000
xFilesFactor: 0.5
aggregationMethod: average
fileSize: 534580

Archive 0
retention: 86400
secondsPerPoint: 60
points: 1440
size: 17280
offset: 52

...

Read the rest of this story...

Unhelpful Graphite Tip #8 - Dump your Whisper Metrics

2012-04-18 10:59:38 by jdixon

If you've mucked around with your Whisper storage policies or needed to migrate your data to/from Graphite, there's a good chance you've used some of the bin scripts like whisper-info.py and whisper-fetch.py. Unfortunately there are some drawbacks with whisper-fetch.py, most notably that it only fetches content from the first archive to match the requested time period, and it won't return the original raw data after the rollup policies take effect.

Read the rest of this story...

Unhelpful Graphite Tip #7 - Organizing your Saved Graphs

2012-04-15 19:01:26 by jdixon

If you're logged into Graphite as an authenticated user you have the option of saving graphs, which will appear under the "My Graphs" folder in the navigation tree to the left. There are some limitations (you can't include spaces in the filename) but it's otherwise a useful feature for saving and sharing graphs with others.

Unknown to some users, Graphite's dot-delimited naming schema is not only available in metrics, but in saved graph names as well. Once you've created or modified a graph, click the Save button (floppy disk icon)...

Read the rest of this story...

Graphite Script for Campfire Hubot

2012-04-13 23:42:57 by jdixon

We use Campfire extensively at $DAYJOB. As our Ops team is 100% remote, it's become indispensable for us. Although it has some minor warts (lack of proper timestamps) it works quite well as a chat medium and collaboration tool. Because of its popularity, there are tons of plugins available. Not the least of which is Hubot, a bot written by GitHub specifically for Campfire.

Read the rest of this story...

Unhelpful Graphite Tip #6 - Filtering by Most Deviant

2012-04-13 09:57:32 by jdixon

I remember one day when I was trying to narrow down an application causing high load on an outlier within a fleet of servers. Nagios wasn't suitable for the task, as it only told me which hosts were currently spiking, not which ones have been spiking for a certain window of time. And it certainly couldn't identify a particular host based on a performance visualization.

My Graphite wizard hat went on and I went to work, narrowing down the list of suspects using wildcards and visually inspecting each host's load profile. Within 5 minutes I found my suspect and basked in my glory.

Naturally my brilliance was short-lived.

Read the rest of this story...

Unhelpful Graphite Tip #5 - Solid State Drives

2012-04-12 15:32:40 by jdixon

Artur Bergman (@crucially) kindly recommends:

Editor's Note: Seriously though, you really should move your Whisper files over to SSD if you haven't already. The IO gain is tremendous and allows you to spend your time being more creative with process distribution across CPU cores (hint: future article).

Unhelpful Graphite Tip #4 - Bootstrap the Django DB

2012-04-12 08:17:43 by jdixon

If you're not already aware, Graphite uses Django as the web framework for its underpinnings. In particular, it relies on Django for all user administration, authentication and authorization facilities. This is convenient for Graphite developers, but can be rather inconvenient for Graphite administrators with little-to-no Django experience.

One of my earliest headaches with automating Graphite installations was trying to workaround the interactive manage.py syncdb step from the installation doc. This is usually something everyone wants to run, since it performs the initial admin user creation.

Read the rest of this story...

Unhelpful Graphite Tip #3 - JSON Output

2012-04-11 10:06:13 by jdixon

I love JSON. No really, I fucking love JSON. It might have something to do with its phonetic approximation to my own name. Or it might be my preference for anything that hastens the death of XML. Either way, it's a handy format that's become ubiquitous for data interchange. And fortunately for those of us who prefer our graphs rendered client-side, Graphite supports it as an output format.

Read the rest of this story...

Unhelpful Graphite Tip #2 - Graph Bookmarklet

2012-04-10 18:58:26 by jdixon

I wish I could say I've been using this little gem for years. Alas, I just learned about it last night courtesy of R. Tyler Croy (@agentdero). This has already been a godsend, in less than one full day of use.

Read the rest of this story...

Unhelpful Graphite Tip #1 - Frequency of Events

2012-04-10 00:41:02 by jdixon

I'd like to begin sharing more of my knowledge as it pertains to using Graphite in production. Most of these upcoming posts are bound to be of the "check out this cool function" variety, but hopefully you can stitch them together into something useful. Before I proceed, I'd like to thank Chris Davis and the team at Orbitz who started this incredible software project and released it to the open-source community. Without your work I'd be stuck using something... less awesome.

Today's tip comes courtesy of a combined effort by me and Michael Leinartas (@mleinart). I've used this particular combination of functions before to calculate the number of "events" in a series during a particular timeframe. Unfortunately I failed to record this query anywhere (pro-tip: save your best Graphite functions in a document or gist, you'll be glad you did) although I had a vague idea of the functions needed. Michael was kind enough to remind me of the particular order for chaining the functions.

Read the rest of this story...

On Being a Product

2012-03-21 11:18:35 by jdixon

By now you've heard the adage "You're not Facebook's customer, you're the product". This is a readily accepted dirty secret of social media. In fact, the practice of selling user data for advertising precedes the origins of the Internet itself. And yet, how many of us never give a second thought to granting third-party access to our private social data via OAuth logins on Facebook, Twitter and Google?

I've complained incessantly about abuses of these authentication services. On one hand you have rudimentary, coarse-grained access levels from the authentication providers. On the other you have lazy (or ill-informed) developers configuring their application to demand more rights than it actually needs to fulfill its service contract with the user. Fortunately the OAuth dialogs are mostly transparent about the privileges you're granting the application provider. Yet many users disregard this notice in exchange for the instant gratification of a popular new social media-powered toy.

Let's assume for a moment that the third-party app you just granted access to your private data is trustworthy. What happens in six months when that app continues to skyrocket in popularity and gets bought out by Evil Data Acquisition Conglomerate, Inc.? Your data just became their data. Which also happens to be sold out to advertisers and information brokers; or to anyone who wants to scrutinize your personal behavior, contacts, buying patterns, friends and family, education, political connections or employment history.

Think about this the next time you're signing up for a new photo-sharing app with sepia filters.

Cron As A Service

2012-02-11 10:35:19 by jdixon

I recently found myself in the need for a way to run a one-off Ruby script at scheduled intervals. As this is a work project I didn't just want to run it on my laptop or some random server. Turns out there's an easy way to run this for free on the Heroku Cedar platform without having to piggyback it on a "real" application. Because there are no web processes running, we'll be able to limit our dyno usage to a single dyno (in other words, it's free).

The script itself handles garbage collection duties for removing expired hosts off our beta account with Boundary. Basically I just want it to run every hour and cull anything that hasn't reported to their collectors in a day. For the purpose of this article the contents of the script are inconsequential, although I intend to present it fully in a future post.

Read the rest of this story...

Trying to Get Shit Done

2012-02-09 23:38:00 by jdixon

This evening I asked for your suggestions on blocking online distractions, allowing me to focus on code for an extended period of time. I have a constant struggle with interruptions (read: shiny things) including online news, email and Twitter. There was a flood of responses in no time. Here are the more popular suggestions, along with my winners below.

  • quit the offending apps
  • block websites
  • login with a different user
  • work offline
  • coffee
  • music
  • self-control (lol rite)

I suspect that all of these would have some benefit, perhaps except coffee, which has never given me much of a boost anyways. As a side note, I've quit Diet Coke since my surgery and am exclusively drinking water. So it's possible that coffee might give me an insanely productive hit, but I'm not willing to tread that path yet. Here are the specific steps I took which seemed to work quite well for my particular workflow.

  1. quit Twitter app
  2. quit Chrome (primary browser)
  3. quit Firefox (used for HTML email, Facebook and banking)
  4. quit Propane (Campfire app, work communications)
  5. quit Adium
  6. quit Skype
  7. detached my remote screen session (mutt and irssi)
  8. equipped Sony MDR-V6 headphones
  9. launched Spotify radio (trance)
  10. put iTerm2 in full-screen mode (used for psql, git and debugging development server)
  11. put MacVim in full-screen mode
  12. launched Safari for API docs and development site

I'm pleased with the recent experiment, even if it only lasted one hour. I'll continue to make adjustments and report any significant improvements I find.

Sandwich Porn

2012-01-15 23:57:59 by jdixon

Can you get any simpler than this and still call it a sandwich? Belying its simplicity, this rustic combination has excellent texture and flavor. Lay them out on a baguette and call it a day.

Sequel Migrations on Heroku

2011-11-30 10:39:02 by jdixon

I find myself using Sequel in conjunction with Sinatra these days to write more of my web applications. Never having been a fan of ORMs in general, and mostly comfortable with the ickier bits of SQL wizardry, it took me a while to warm up to the idea of using one for database migrations. But I've seen the possibilities with stuff like ActiveRecord. Being able to migrate my schema into a versioned state is "dee-lish".

Read the rest of this story...

Wherefore art thou, BankSimple?

2011-10-01 16:46:33 by jdixon

Ever since I first heard about BankSimple over a year ago, I've been anxiously awaiting their public launch. They promised to "reinvent personal banking"; to make online banking simpler and effectively, to not suck.

A little over a week ago, the BankSimple blog announced their first look at their online interface. A video walks through their search capabilities and demonstrates how finding data will be much simpler than the traditional online banking UI. Much of what I saw reminded me of Mint.com.

Over the last week I've talked with a number of friends and peers about the BankSimple announcement. Meanwhile we've had events like the OccupyWallStreet protests and backlash to Bank of America's announcement that they'll begin charging customers $60/year for the privilege of using a debit card.

I hope that BankSimple realizes that this is their opportunity in waiting. We don't need another Mint.com (yet). We don't need the Facebook equivalent of online banking. We need a fair, trustworthy banking service that doesn't rape us with fees, avoids predatory behavior and bait-and-switch offerings and isn't constantly focused on their next acquisition. Convenient features, such as depositing checks with a smartphone camera, are great but not a necessity. In short, we need the online equivalent of a good credit union.

Traditional banks have chosen to innovate only where it suits them. I hope that BankSimple can seize this opportunity and usher in a new age of consumer banking. Guys, please don't drop the ball. In the meantime, I think I'll start looking at the alternatives.

Thoughts on Surge 2011

2011-09-30 22:03:47 by jdixon

I had another great year at the Surge scalability conference in Baltimore, MD. Many of you know that I consider Surge to be "my baby", having conceived of the original conference vision, name and motto while employed at OmniTI. Even though I've moved on, I'm proud to see it grow and flourish while keeping its intimate feel intact.

Long story short, Surge 2011 kicked ass and took names. The speaker lineup was impressive and there were improvements across the board. Audio and video were outsourced to a professional team. On Thursday, lunch was provided and after the last session, Google had a nice party with plenty of hors d'oevers and beer. Everyone appeared to have a great time, sharing war stories and networking with peers.

I was invited by this year's team to organize the Lightning Talks on Wednesday night. Although I wish I'd scrapped the Karaoke PowerPoint event as I was inclined to do, the rest of the night went off without a hitch. The talks were consistently awesome, with Adam Jacob putting the capper on the evening.

Sessions on Thursday were excellent. Ben Fried held keynote honors, describing one of his greatest failures and how it helped shape the way Google IT operates. Artur Bergman was typically irreverent towards Linux kernel developers and inferior hardware. My favorite talk of the conference happened to come from Mark Imbriaco, the Director of Cloud Operations at Heroku (and coincidentally, my boss). But seriously, it was a brilliant interactive session full of insightful real-life incident response tactics and Q&A with the audience. Ironically, our Heroku operations team had to skip the 2:30pm slot to respond to an urgent incident within our architecture. My day wrapped up with a hilarious Choose-Your-Own-Adventure talk by Adam Jacob of Opscode.

Friday's sessions were good but struggled to compete with the consistently high quality of the previous day. I enjoyed Theo Schlossnagle's dissection of the Circonus real-time data subsystems, even though I'm intimately familiar with them already (as former Product Manager of the same). I caught the latter portions of Baron Schwartz's talk on performance metrics and the first half of Mike Panchenko's talk on cloud infrastructure. Unfortunately I had to skip the latter half of the conference's last day due to family commitments, but I've heard great things on Twitter about the remaining sessions.

My only real complaint was the Internet connectivity. Unlike last year, where I insisted on using Port Networks exclusively, this year the organizers chose to outsource part of the conference network to the Tremont IT staff. I'm unsure of the specific cause of the failures, but the symptoms were random failures to load TCP connections from various sites. On the first day, for example, I was unable to load the Surge website without it blocking on the Fontdeck CDN. The next day, I couldn't SSH to any EC2 hosts (although I was able to get to my personal server at ARP Networks) or load Basho and Etsy websites. Everyone I spoke with encountered similar failures, but not always the same sites (ruling out DNS issues). It appeared to be caused by overzealous application filtering or possibly a connection limit. I spoke to multiple OmniTI employees and nobody knew what the cause was, other than it had something to do with the Tremont service.

Also, I noticed a distinct lack of war stories as compared to last year's event. Surge was envisioned as a place where internet practitioners could share and learn from each other's mistakes. With a couple distinct exceptions, it just wasn't the case this year. It felt more like a chapter from O'Reilly Strata 2011 (read: Big Data) than Surge 2010. Nevertheless, there was plenty of good information to be gleaned throughout.

I had an incredible time at this year's event with my old friends at OmniTI, my operations and engineering compadres at Heroku, and countless friends and associates from IRC, Twitter and real life. As much as I enjoy conferences like Velocity, OSCON and DevOpsDays, I don't think they hold a candle to the concentration of operational and engineering excellence that you find at Surge. I'm thrilled that they're committed to keeping Surge at the Tremont. Although it's a quirky building with limited modernities, it guarantees that this event will never grow too large or become "commercially compromised". Hope to see all of you (well, 350 of you anyways) again next year.

Mad as Hell

2011-09-25 00:24:02 by jdixon

Laid a smackdown of truth on a loudmouth conservative parent at a kid's birthday party today. This guy (another kid's father) complained about pro athletes being "greedy" and how firemen are underpaid and not getting salary increases. I forced myself to get up and explain to him how:

  • Both are a result of the supply and demand of the capitalist society he believes so strongly in.
  • He's a hypocrite for supporting his own union but decrying the work of the NFL Player's Union.
  • While I respect his work and applaud him for his efforts, he chose his profession. Nobody forced him into it. If he wasn't there to do it, someone else would fill the vacuum and happily take his paycheck.
  • Blaming Obama, or Bush, or even Clinton for his woes is asinine. The toxic state of our government is a direct result of unbridled capitalism that's run unchecked for the last 30 years and continues to deteriorate.
  • Corporations like GE earn $14B in profits but pay zero in taxes.
  • Iraq didn't attack the United States, Al Qaeda did. We invaded Iraq without cause and continue to participate in wars without justification.
  • The aforementioned reasons are why our government programs and agencies are struggling to make ends meet, not because of "entitlements" paid out to citizens who've earned their social security.
  • We can't continue to support an upper-class that refuses to pay forward their dues to society.

Generally speaking, I'm a timid sort at social affairs. I'll keep to myself with a soda and my phone. But I heard this guy ranting and called him out on it. I shouldn't brag, but I'm fucking proud of standing up for my beliefs today. And to his credit, this guy had the decency to listen to what I was saying and, as best as I could tell, actually made sense of what he heard. At the end we shook hands and agreed that it's good to talk about these issues in healthy debate in public.

I don't know what came over me. It might have something to do with me watching Network again this week. Although the film is 35 years old, it's a striking narrative of today's problems in politics, mainstream media and corporate America. I'm tired of the ignorant posturing by both sides, fueled by self-serving tabloid hawkers and a political system tainted by corporate greed. There are decent people on both sides of the two-party spectrum but we're forced to eat from the news feedbag with a hood over our eyes.

I think Howard Beale said it best.

Why the Netflix Pivot Will Fail

2011-09-19 12:12:27 by jdixon

Netflix has always been about convenience. They killed Blockbuster on customer service, convenience and price. They've continued to compete against up-and-comers like Redbox thanks to their streaming offering. Netflix CEO Reed Hastings feels that now is the time for an epic pivot, allowing each product to stand (and compete) on its own.

As a cohesive unit with simplified billing, and offering customers the flexibility and choice they're accustomed to, Netflix is ubiquitous. Thanks to a loyal and addicted user base, they've made inroads with a multitude of video and gaming appliances. Consumers pushed for Netflix access on their TiVos and Xboxen, and TV manufacturers are starting to include Netflix support in newer "smart TVs".

And yet, I predict that Netflix/Qwikster will be dead within three years. The move to independent business units a) results in higher prices, b) makes it less convenient for viewers, and c) removes operational and marketing efficiencies found in their current business operations.

Redbox will continue to chip away at the Qwikster "legacy" DVD market. Who wants to wait 2 days for a DVD when I can pick one up in 10 minutes from the corner Walgreens? And how long before the movie studios push Netflix aside for more lucrative, direct partnerships with the appliance manufacturers and vendors?

Hastings is trying to sell the vision that this is a necessary pivot to remain viable in the market. Rather, I suspect this is their attempt to increase short-term shareholder value. I chose to stick with Netflix through the recent price hikes for the continued convenience of one-stop shopping. But with the lack of streaming choice, separate bills, and less convenient DVD rentals, I don't see myself sticking around for long.

Why Chef pwns Puppet

2011-09-10 09:18:16 by jdixon

I don't know why, it just does. Seriously, Ruby just destroys Puppet's DSL. I guess that's not really a vote for Chef as much as it is an incrimination of poor DSLs. Either way I'm 10x as productive with a fraction of the hair loss.

Thanks, Opscode.

PostgreSQL 9.0 createdb Revelations (Updated)

2011-08-27 15:18:16 by jdixon

One of my first projects at Heroku has been to modernize our shared PostgreSQL offering (working with @asenchi). As we get closer to internal testing of our new service, @markimbriaco asked for benchmarks looking for any bottlenecks in PostgreSQL 9.x when creating large quantities of small databases. We've seen instances where Pg 8.3 will start to choke after 2000 databases on the same server and we're hoping that 9.x alleviates this issue.

My initial test was overly simplistic but still revealed some interesting patterns. I started with createdb on the command-line, generating 8000 roles and empty databases, serially. The results were promising, with PostgreSQL 9.0.4 (Ubuntu 10.04) able to scale up without any noticeably increasing latency. Unfortunately, it's not a terribly useful benchmark given the absence of any workload. And yet, I couldn't help but notice a pattern in the scatter plot:

Notice the gap between 500 and 600 ms? I don't have an explanation for this but I suspected that Pg has an internal condition that triggers for actions that take 500ms or longer. Regardless, our primary expectations had been met. Whatever bottleneck 8.3 demonstrated when creating databases on a server with large quantities of existing small databases appears to be fixed in 9.0.

The next test was to run a similar sequence with our new application server. It offers an internal RESTful API using Sinatra and Sequel to provision and manage customer databases on shared servers. The results for this run were even more enlightening. Check out the stratification:

Not only is the initial gap (around 400ms) even more pronounced, but you can see a pattern of latency introduced at 200ms intervals after the initial 400ms delay. I have no explanation for this, but I wanted to publish these results and see if anyone else has a guess as to what might be causing these patterns.


UPDATE: To rule out any distortion caused by GNU time, I ran another test using Ruby's Time class to get a more accurate representation. In the most simple terms, we start the clock with Time.now, connect to the database (no caching), create a role, create the database and stop the clock. Output is logged and then imported into Excel for plotting. I think the results speak for themselves (measured in milliseconds):

One Happy Ending

2011-08-25 12:47:34 by jdixon

Chances are you don't already know this about me, but I have a son who experienced a volvulus when he was three years old. This is a dangerous obstruction of the bowel caused by congenital intestinal malrotation (in other words, the bowels get "twisted" during fetal development). If the condition turns into a volvulus, the constricted portion of bowel will lose blood flow and die. In my son's case, he lost a significant portion of his large and small intestines. To be blunt, he was within minutes of death.

Nathan was unconscious in critical care for two weeks weeks, in the hospital for seven months, and has been back at home trying to resume a normal life for the past three years. It would be an understatement to describe this as a taxing experience for our entire family. The first couple years required my wife to quit her job and become his home nurse. Either of us would be up all hours of the night administering drugs, vitamins and refilling the pump that provides his nutritional formula.

The routine has eased over the last year, primarily in frequency and volume of administrations. But it still required staying up past midnight, every night, refilling his formula and managing the pump. One of the common conditions of short gut patients is an aversion to eating. Like anyone who's broken a joint, it can require months of rehabilitation. We arranged for an eating therapist to visit weekly, helping us work with Nathan to overcome his fears and get comfortable with the act of chewing and swallowing. It's an acutely frustrating process, especially for someone like me who has no problem with eating (wink).

Slowly and surely, he's increased his daily intake of "normal" food. What started as a few Cheerios (literally) eaten by hand a year ago, has increased to 1450 calories this past Saturday. His diet is still but a shadow compared to that of the average six year-old, but it's expanding each week.

And then, just this morning, the doctor informed my wife this morning that Nathan no longer has to use the formula pump at all. None of us expected this. I cried when I heard the news. I'm fighting back tears as I type these words. I can see daylight after all.

I hope this doesn't read as overly melodramatic. Truth be told, I didn't sit down to write this story for anyone else. But it feels good to write it down. To let the pain and joy and frustration and relief just pour out into the keyboard. It feels damn good.

You'll have to excuse me now. I'm going to take my son out for a hot dog. It's gonna be a great day.

Fixing Group Permissions after Migrating to OS X Lion

2011-07-31 22:17:51 by jdixon

I've discovered that restoring a user account from a Snow Leopard (10.6) Time Machine backup to a new system running Lion (10.7) fails to preserve membership in gid 20(staff). I don't know if this only affects users in this particular scenario or might affect other upgrades/fresh instsalls, but it certainly bit me in the ass. I first encountered problems when trying to brew update, only to discover that it wouldn't let me write anything to /usr/local even though the directory had group-write permissions. Lo and behold, I finally realized that my membership had been revoked.

$ id
uid=501(jdixon) 
gid=501(jdixon) 
groups=501(jdixon),401(com.apple.access_screensharing),12(everyone),33(_appstore),
61(localaccounts),80(admin),98(_lpadmin),100(_lpoperator),204(_developer),
101(com.apple.access_ssh),402(com.apple.sharepoint.group.1)

The fix is simple enough. Use dscl to add yourself back to the staff group membership.

$ sudo dscl . append /Groups/staff GroupMembership `whoami`

Giant Robots Are Cool and Shit, But Seriously...

2011-07-15 16:54:34 by jdixon

I'm pleased to see so many people interested in the #monitoringsucks movement/campaign/whatever. My last post seemed to resonate with a lot of you out there. I'm excited to hear discussions surrounding APIs, command-line monitors, monitoring frameworks, etc. But I think a major thrust of my article was missed. It's not just that Nagios can be a pain in the ass, or that we need a modular monitoring system. What I'm trying to emphasize is that monolithic monitoring systems are bad and not suited for the task at hand.

Some very smart systems people (and developers) are trying to solve this problem in the open-source arena. Unfortunately, while they're attempting to diagnose and cure the problems in contemporary monitoring systems, they continue to architect big honking inflexible software projects. When I refer to "the Voltron of monitoring systems" I'm not talking about an enormous fucking automaton of monitoring, alerting and trending components. I mean that each component should exist independently of the others, with a stable data format and communications API. Any single component should be easily replaceable and deprecated. Authors should strive for competition because it makes the inclusive architecture that much stronger.

Realistically I see one of three things happening over the next 12-18 months:

  1. A community forms around a reasonable set of defined components and begins cranking out useful bits. Over time we have what resembles a useful ecosphere of monitoring tools and users.
  2. Motivated developers continue to solve the issues affecting monitoring software, but in their own walled garden projects. We benefit from a larger pool of projects to choose from, but they all continue to suffer from NIH syndrome.
  3. I'm disregarded as a nutcase. Nothing changes and we continue to use the same crappy ubiquitous software.

At this point I think the most likely outcome is a combination of numbers 1 and 2. It's hard for anyone to justify working on a disassociated component when the related components it needs to be useful might never be developed. On the other hand, if someone working on a monolithic project has the foresight to break up the bits into a true Service Oriented Architecture, then it would be feasible for external developers to fork individual units.

Achievement Unlocked: Heroku Operations

2011-07-11 11:59:56 by jdixon

I'm proud to announce that I'll be starting at Heroku in a couple of weeks. This is an exciting opportunity to work at a place that breathes DevOps and eats Infrastructure as Code. Whenever you hear someone describing "Platform as a Service", there's a good chance that Heroku will be the example they're talking about.

I first met Mark Imbriaco (@markimbriaco) when he was the Operations Manager at 37signals. Mark's a level-headed guy with a undeniable talent for Web Operations and an excellent track record for supporting his customers. It was no surprise to me when he took over as the Director of Cloud Operations at Heroku. Even after the acquisition by Salesforce.com last December, they've continued to innovate at a breakneck speed (proof here, here, here and here).

Heroku development and operations teams get to work on the sort of rapid scaling and engineering challenges that pique my interest. I'm doubly excited to be able to share the fact that I'll be joining up simultaneously with Curt Micol (@asenchi) as the newest Operations Engineers on Mark's team. It's an odd coincidence that we're both big fans of BSD. Hopefully nobody holds that against us. ;-).

Needless to say I'm thrilled about the whole thing and hope that it gives me more cool stuff to write about here. Stay tuned.

Monitoring Sucks. Do Something About It.

2011-07-07 23:45:30 by jdixon

For as long as I can remember, systems administrators have bitched about the state of monitoring. Now, depending on who you ask, you might get a half dozen (or more) answers as to what "monitoring" actually means. Monitoring is most commonly used as a casual catch-all term to describe one or more pieces of software that perform host and service monitoring and basic trending (graphs or charts). But in most cases, these complaints are targeted at software responsible for daily fault detection and notifications for IT shops and Web Operations. The usual whipping boy is Nagios, a popular open-source monitoring project that supports a universe of host and service checks, notifications, escalations and more.

Nagios has been the "lesser of all evils" for quite some time. Its cost (free), extensibility (high) and configuration flexibility have helped it achieve significant adoption levels across a variety of industries and range of business sizes, from small one-man web startups to Fortune 500 enterprises. It's been forked multiple times and is recognized by industry analysts as a force to be reckoned with. Regardless, those who use it, do so with a fair amount of hostility. Ask around and you're likely to find more users who stay with Nagios because it's "good enough" than those who actually like it. So why doesn't Nagios have more competition in the open-source marketplace? Largely because writing an entire monitoring system from scratch is an enormous undertaking. Ok, does that mean we should keep improving Nagios (or forking it... again)? Perhaps.

Read the rest of this story...

The Most Interesting Blog Post in the World

2011-06-08 23:11:40 by jdixon

The most interesting blog post in the world... is somewhere else. If I was one of those douchebags who tweets about an EXCITING NEW BLOG POST that is really just a vanity post to suck you in for artificial hits, then link you to the real story hosted elsewhere, you would find the link below.

But I'm not. So you won't.

Flying Cars and Food Capsules

2011-05-25 21:02:52 by jdixon

Today I was installing RHEL 6.0 on a remote Xen domU using virt-install with VNC. None of the Mac VNC clients I tried was able to render anything remotely usable. I tried various encoding schemes and color resolutions, to no avail. And where Chicken of the VNC rendered a screen seemingly inspired by LSD trips, RealVNC simply shit its pants and crashed.

So I downloaded an OpenBSD 4.9 iso and installed it in VMware Fusion. Installed tightvnc-viewer from packages. And in less than 10 minutes, I had a working X11-over-SSH tunnel to the remote Xen VNC console. From my Mac desktop. Through an OpenBSD VM. Across the fucking internet.

Welcome to the future. Sorta.

Trending with Purpose

2011-03-18 13:52:44 by jdixon

I threw together a presentation on short notice this week for an internal tele-conference about Trending with Purpose. The end result was much better than I might have expected (even given my penchant for procrastinating). Although much of the content is specific to applications currently in use at $DAYJOB, I think there's something to take out of it even if you're not using these tools.

The content is intended for developers who might not (or know how to) use application profiling data to complement their operations' monitoring and trending efforts. Special props to the Orbitz.com developers for open-sourcing their Graphite graphing tool, as well as John Allspaw and the Etsy Engineering team for their work on StatsD, and for generally serving as innovators in the Web Operations industry.

Special note: These slides were thrown together in rapid fashion. Anyone who experiences violent reactions to Gill Sans Italic should not download this slideshow. You have been warned.

The slides are available here.

Double-Spacers are Not Evil

2011-01-14 08:46:01 by jdixon

Recently, a torrent of criticism has been unleashed towards "double-spacers", bitching and moaning about our excessive keystrokes. Single-spacers en masse are mocking our outdated beliefs. They trot out their modern typographers, quoting type-space rules and style guides. If the level of vitriol is any measure, we've caused them a great deal of distress.

Yes, I admit it, I'm a lifelong two-spacer. The habit was learned as early as middle school, drilled home in high school, and reinforced in college. It's the sort of thing that becomes ingrained and part of your muscle memory. So trust me when I say it's no small task to unlearn this behavior. But I have absolutely no qualms about changing my spacing if it's the right thing to do. In fact, I've already started.

I'm not quite sure what teed up these people and unleashed their rage over such an innocuous thing. Are their lives so methodical that a few extra pixels cause their stabilities to unravel? I have little doubt that what they're preaching is the truth. That's not the crux of my issue. I'm concerned over their lack of reasonable discourse; what they fail to realize is that we double-spacers were trained in this manner over many years. We want to do the right thing. I'd be happy to save a few extra keystrokes. But I don't need to be chided to gain the proper motivation to do so.

In other words, don't be a dick. These are not the typographical equivalent of the Crusades. Publish your findings and make your point. Oh, and give us time to adapt. My delete key is working overtime.

My MacBook Air Kicks Your Laptop's Ass

2010-12-16 22:45:43 by jdixon

I recently found myself in need of a new laptop. I've been using some version of Apple PowerBook or MacBook Pro over the last seven years. I've had a couple Thinkpads mixed in for good measure, but those were always as a secondary computing device, mainly for playing around with OpenBSD. Suffice it to say that I'm a big fan of Apple systems design (XServe and XServe RAID, not so much).

My last portable was a previous generation 15" MacBook Pro with the glossy screen. I won't miss the reflective display but the rest of the unit was solid. My only real gripe was the slow-as-molasses base hard drive (5400rpm, if I remember correctly). There's simply no way Apple should offer that in their premium laptops, especially since they market them as a premium product. Anyways, it was still faster than thin air, which is what I found myself holding after my last day at work.

The new MacBook Air lineup was something that caught my eye recently, particularly the 13" model. The price is a bit much for a "netbook", but one look at the top-of-the-line Air's specs and it compares favorably with most of the MacBook Pro line. Its 1440x900 resolution doesn't hurt either. But one thing that made me hesitate was the CPU... an Intel 2.13GHz Core 2 Duo. I work with VMware Fusion a lot so I was naturally concerned about any sort of performance issues. Hell, we can't even run Flash games on my daughter's Dell Mini 10v. So yeah, I was a little concerned.

Nevertheless, I took the plunge. And Oh [Your] God, was it worth it. This is my first experience with a real SSD drive. And let me tell you, it makes ALL the difference in the world. This Air runs VMware faster than my old MacBook Pro by an order of magnitude. I can suspend or resume Windows XP images in under 5 seconds. The same actions used to take upwards of 30 seconds on the Pro. It's pretty obvious by now that desktop virtualization is heavily I/O bound. The CPU just doesn't have much to do by comparison.

Everything else about the MacBook Air was as expected. It's a very lightweight form-factor with a great redesign of the port locations (and availability). I haven't had the opportunity to try out the mini-display-port external output yet, but enjoy having a USB on each side. The SD slot is also a nice touch but is pretty standard across laptops these days. Sleep and resume are almost instantaneous. The keyboard is full-sized and roomy.

In summary, I'm thrilled with my purchase. I've managed to shave off some old unused VMs to make room for my music collection, which used to exist on an external drive. I hate the idea of lugging around an external drive with such a petite portable, so I managed to find enough space on the 256GB SSD. This is quite literally the perfect laptop for me right now. I fully expect my OpenBSD friends to give me shit over it, and it's almost worth it.

A Little Time Off

2010-12-16 07:36:01 by jdixon

I always wondered what I could accomplish with a few weeks of free time. Never thought I'd have the chance to find out. Yeah, funny thing about that...

I've been Product Manager for an online monitoring service over the past 15 months. I've learned a lot about the full product cycle, building up all the components of a startup web company: cranking out a business plan, analyzing the competition, defining the roadmap, performing QA, etc. It's been a ton of fun, growing an Open Source trending application into a full-blown ECA/BSM suite. Now is the right time for me to hand it off to a full-blown sales organization and see it thrive. Which means I'm taking some time off to look for my next challenge.

I'm happy to report that the job market looks very strong right now. I'd love to believe that all the interest I'm getting is a byproduct of my years of experience and varied skillset, but I'm too self-loathing for that. Regardless, I'm still interviewing and haven't made up my mind yet. So if your startup/Web/SaaS/DevOps company is looking for a seasoned ops/network/security/engineering/product-managing type, drop me a line and let's chat. My CV/resume is available online.

Working with the Mojolicious Framework

Thanks to my reorganized shed-yule, I have the chance to catch up on some side projects. The first one is the evolution of NetFlow Dashboard as a SaaS service. Devon O'Dell has been doing some nifty stuff with the collector, while I've been focusing on the user-facing web and API stuff. I stumbled across the Mojolicious framework, and zomg, you can color me impressed. Compared to Catalyst, "Mojo" is a breath of fresh air. The syntax is actually quite similar to Dancer, but it goes a few steps further, adding placeholders (with optional regex constraints) and named routes. Take for example, the following snippet:

use Mojolicious::Lite;

    # Route with placeholder
    get '/:foo' => sub {
        my $self = shift;
        redirect_to('login') unless ($self->session('username'));
        my $foo  = $self->param('foo');
        $self->render(text => "Hello from $foo!");
    };

    # Defaults to login.html.ep
    get '/login' => 'login';

    # 
    post '/login' => sub {
        my $self = shift;
        redirect_to('login') unless (
            $self->param('username') && $self->param('password')
        );
        # [ ... some code to authenticate user ... ]
        $self->render(text => "Welcome!");
    };

    # Start the Mojolicious command system
    app->start;

    __DATA__

    @@ login.html.ep
    <!doctype html><html>
        <head>
            <%= content header => begin %>
                <title>Login</title>
            <% end %>
        </head>
        <body>
            <%= content body => begin %>
                <form action="post">
                <input name="username">
                <input name="password" type="password">
                <input name="login" type="submit">
                </form>
            <% end %>
        </body>
    </html>

This epitomizes everything I enjoy about Perl code. TMTOWTDI, but without all the crufty framework directories and files that remind me of Ruby on Rails. Mojolicious::Lite is easy to read, easier to write, with all the shortcuts a strapping young web hacker might want. It's smart enough to inject common sense where it should (e.g. searching for templates by named route and format) but powerful enough to let me extend any of the underlying Mojolicious classes (like Catalyst). Good stuff.

Updates on the OpenBSD IPsec Gossip

2010-12-15 15:22:57 by jdixon

As expected, news of a possible ten-year-old collusion to introduce backdoors in the OpenBSD IPsec stack have spread like wildfire. ArsTechnica, The Register, CNET, Forbes are among a long list of mainstream news outlets to chime in on these allegations.

Dag-Erling Smørgrav adds one point to my original commentary; that is, the action of introducing backdoor code into OpenBSD by the FBI would not fall under a "recently expired NDA", as Greg Perry claims. I think Dag is probably correct here. Even if Greg's claims are eventually proven true, something like this would more likely fall under a TOP SECRET (or even as high as TS/SCI) classification, which is typically declassified after a 25-year period. Releasing this information prematurely would land Greg in a steaming lake of hot water.

At least two of the named parties have already stepped forward to refute Greg's story. Scott Lowe posted to the openbsd-tech mailing list, stating that he does not, nor has he ever, had any affiliation or employment with the FBI or the OpenBSD project. Jason Wright followed up a short while later, demanding an apology from Greg Perry and detailing which parts of the code base he worked on during the affected period.

" I will point out that Greg did not even work at NETSEC while the OCF development was going on. Before January of 2000 Greg had left NETSEC. The timeline for my involvement with IPSec can be clearly demonstrated by looking at the revision history of:
	src/sys/dev/pci/hifn7751.c (Dec 15, 1999)
	src/sys/crypto/cryptosoft.c (March 2000)
The real work on OCF did not begin in earnest until February 2000."

I'm personally relieved to see the accused parties step up and assert their innocence. Unfortunately, the story won't end here. The mere possibility of impropriety by these developers or the FBI means the OpenBSD project will have to work long and hard to regain its tarnished reputation. A thorough code audit is the only sure-fire way (and even then, is not guaranteed) to clear these charges.

If you'd like to help with the audit, please consider matching Dag-Erling Smørgrav's triple bounty, or better yet, donating directly to the OpenBSD project.

Deconstructing the OpenBSD IPsec Rumors

2010-12-14 21:58:01 by jdixon

Theo de Raadt posted an email to the openbsd-tech mailing list Tuesday evening which contained details of alleged backdoors added to the OpenBSD IPsec code by government contractors some ten years ago. Subsequent posts from Bob Beck and Damien Miller add further commentary, but neither confirm nor deny the allegations. Damien goes so far as to propose a number of possible avenues as the most likely places to begin a new audit.

One of the purported conspirators is Jason Wright, a cryptology expert at the Idaho National Laboratory, who committed a significant amount of crypto and sparc64 code to the OpenBSD project. Although I haven't seen Jason in years, I consider "Wookie" a good friend and hope these accusations are false. If Damien's hypothesis is correct, it seems highly unlikely that Jason (or any US developers) introduced backdoors directly into the crypto code. A more likely scenario would be the malicious reuse of mbufs in the network stack.

As Brian T. Merritt suggests, it seems even more likely that Linux would be similarly "exploited". Lest we forget that while these claims against OpenBSD revolve around FBI involvement, Linux has had significant portions of its security code infiltrated by the NSA. Between these two code bases you're talking about an enormous portion of the networking infrastructure that powers the Internet.

As a former OpenBSD committer, this saddens me. Not just because of the possibility that this might be true, but that regardless of whether or not this could be true, it means that developer and community resources will be swallowed into the rumor vacuum for untold weeks and possibly months. This results in less innovation, fewer bugfixes, and worst of all, a growing distrust among everyone involved.

This story has all the characteristics of being newsworthy for a long while. It has already made major headlines across Twitter, Slashdot, Reddit and OSNews. Most articles and tweets imply that the claims are fact, without any investigation of the source claim or the actual code in question. I hope that all parties involved are cleared of any wrongdoing. Either way, the cat is out of the proverbial bag. These claims will undermine a significant portion of goodwill and trust among all Free Software / Open Source projects. In the end, nobody wins.

Impressions From NYCBSDCon 2010

2010-11-14 13:33:04 by jdixon

I was invited to give another talk at this year's NYCBSDCon. Motivated by Adam Jacob's Choose Your Own Adventure presentation at Velocity, I tried to include together a series of smaller talks into one session. Unfortunately, I funny thing happened on the way to the forum. A week before the event, I fat-fingered some commands on my laptop and blew away the slides. The recreated version was quite a bit different than originally advertised, but I think it went pretty well. Comments went from "your best talk ever" to "good, but the pacing on BSD is Dying was better".

Worried that the main presentation would be too short, I threw in a bonus CYOA-style web story. This went over better than expected, so I've put it online if you want to see it for yourself.

Will Backman (of BSDTalk fame) was kind enough to provide me with the audio from my talk already. I'm going to start syncing it up with the slides this week and then perhaps later on with the video taken by Patrick (awesome A/V guy at the event).

My initial impressions from the event:

  • Nice building, so-so location. Cooper Union is hard to find at first (Google Maps has no idea) but it's an attractive facility. Complicating matters is that it isn't near a subway station, so I had to take a taxi from Penn Station. And apparently the stations that are close were closed due to construction.
  • Good social. They had a room reserved at the B Bar, which seems to be a hopping place. It got pretty loud inside once the native crowds rolled in, so quite a few of us rolled outside and had a very interesting discussion on the future of BSD innovation. My only complant would be the lack of quality beer offerings. The choice of craft beers in your typical Maryland/DC establishment blows away what I saw in a couple places in New York.
  • Great turnout. I think George mentioned that they broke 200 registrations this year. That's a big jump up from the 130-or-so from 2008. Glad to see the conference growing, especially with the bi-annual scheduling.
  • Good talks, nothing mind-blowing. One of the themes I'd really like to hear more about is where BSD might be going with regards to virtualization and scalability. It's nice to hear about finished or ongoing development efforts, but I'd also like to hear what sort of roadmap the BSDs are working on (if it exists at all). Many BSD developers readily dismiss The Cloud as marketing buzz-speak, but the fact is that virtualization, scalability challenges and resource oversubscription are here to stay. I'm very happy to see FreeBSD adopt DTrace and ZFS from OpenSolaris (which start to get us there), but there's so much more to do.

As mentioned in my presentation, I fully expect this year's talk will be my last. Consider it the last chapter of my "BSD trilogy", as it were. I'm glad that so many people came out to hear me talk and seemed to enjoy themselves. I look forward to being just another attendee next year, waiting to see where the BSD movement takes us.

An Exit Strategy

2010-03-31 08:24:35 by jdixon

News broke recently that Oracle would begin enforcing old-school licensing policies on Solaris. The future of OpenSolaris has been in question for some time now. The writing is on the wall, in this geek's opinion. Oracle is a revenue-generating monster with blinders towards open source. The product manager in me appreciates the directness of it. The hacker in me despises them for raping Sun Microsystems and pulling the rug out under from the rest of us.

This will almost certainly renew interest in BSD distributions. Sure, Linux will get plenty of (re-)adopters migrating off Solaris. But keep in mind that many Solaris users left Linux for greener (read: more stable) pastures. They've tasted the delicacies of ZFS, Dtrace, project Crossbow and zones. Linux is a big bitter pill to swallow after you've tried those.

Fortunately, users have a choice. Although I'm not a big user of FreeBSD myself, I appreciate the work they've put into porting ZFS and Dtrace. They have OpenBSD's PF packet filter and experimental support for Valgrind. There are plenty of reasons to love FreeBSD right now. Suffice it to say that I'll be testing my alternatives and looking for an exit strategy from Oracle.

New Year's Resolutions

2010-01-01 22:28:02 by jdixon

I'm not sure how effective it is to post these here, but I'm hopeful that having them in cyberspace will help keep me motivated. I'm hereafter calling these goals rather than resolutions The latter, to me, implies something that you begin immediately. This cold-turkey approach virtually guarantees failure. The moment you trip up, the subconscious immediately considers them a lost cause and reverts to the old behavior. As goals, I think it sets a more optimistic tone and allows me to gradually adapt the preferred conduct.

Without further ado, my personal list of goals for this year (in no particular order)...

Read the rest of this story...

Rudolph the Bastard Reindeer

2009-12-27 21:52:43 by jdixon

I'm probably not treading into undiscovered territory, but having re-watched a number of my favorite Christmas specials as an adult, I couldn't help but notice the influences of an earlier, simpler, uglier life in America. Rudolph the Red-Nosed Reindeer had an especially hellish upbringing in the shadow of Claus and his elven slave-drivers, according to the storytellers at Rankin/Bass Productions, Inc.

Overtones of discrimination and the Old South start from the very beginning. The comforting tone of Burl Ives as the friendly, banjo-toting, Good 'ol boy snowman-narrator help to lighten the apocalyptic mood of newsreel footage foreshadowing the storm that inevitably vindicates Rudolph's hapless existence.

Read the rest of this story...

Managing Expectations

2009-12-17 14:47:35 by jdixon

If you're unaware, there's an Advent calendar for Systems Administrators. Strangely enough, they accepted my submission and published it this past weekend. I believe these philosophies will benefit anyone who has "internal customers", but they are especially well-suited for IT professionals. If you have other suggestions please let me know.

Announcing Blogsum-1.0

2009-11-14 12:56:56 by jdixon

I'm happy to announce the release of Blogsum-1.0. This release includes a number of bugfixes and a couple enhancements over 0.9:

  • Fixed preview mode
    Preview content is now encoded so markup will always get recreated properly in your browser.
  • Tag Cloud
    Thanks to Jim Razmus who submitted this new feature. Make sure you add the new $max_tags_in_cloud setting to your local Blogsum.pm.
  • Update date when (re-)publishing
    The published timestamp updates when you publish or republish an article.
  • Fix timezones in db
    Fixed a bug where article or comment timestamps were always set to GMT instead of localtime.
  • Fix pagination
    Removed pagination view from all non-default views. That is to say, we shouldn't paginate when viewing by year/month or tag filters.
  • Minor aesthetic improvements
    Lots of whitespace fixes, a redesigned footer and the addition of a meta generator tag for Blogsum.
  • Example httpd.conf for Apache-2.x
    Thanks to Dan Colish for testing Blogsum with Apache-2.x and submitting his configuration example. This has been added to the examples directory as httpd2-blogsum.conf.

I'd like to also thank Johan Huldtgren for submitting Blogsum to the FreeBSD ports tree for inclusion. It has been accepted and will likely bring many new Blogsum users, which will inevitably cause me to struggle even harder against the onslaught of feature requests. ;)

Just kidding, I'm glad to see Blogsum gaining interest in the community. I've also updated the OpenBSD port, if you happen to be using that instead of following svn. Enjoy!

Where Obfuscurity Meets Negligence

2009-10-21 19:24:42 by jdixon

There are people out there who would argue that Security through Obscurity is better than no security at all. They advocate port knocking or running applications on "random" ports. Certainly, I'm not one to go around broadcasting my attack vectors to random visitors (oops!), but that doesn't mean it's a rational means of protection (honest, I'll pull out this time).

Read the rest of this story...

Pressing Needs

2009-10-21 14:12:39 by jdixon

Love fonts? Check out the work by Jessica Hische over at the Daily Drop Cap. I stumbled across her work thanks to an interview linked by @shiflett. Believe it or not, I was an art geek before I went full-bore UNIX geek. I still have an appreciation for the analog arts even though I'm a left-brainer now.

Jessica's work is very impressive. It almost makes me want to try my hand at letterpress. And then I saw Pictorial Webster's and became afraid. Very afraid. Just kidding, it's unbelievably cool no matter which half of your brain dominates. Check it out now!

Let Your Mutt Growl

2009-10-18 00:30:45 by jdixon

Like any self-respecting UNIX user, I consume most of my email through the console. Mutt has been my client of choice for a few years now. I used to be a die-hard Mail.app fan on my Apple systems, but the performance was abysmal. As time went on, I evolved from running Mutt on my laptop to running it in screen on a home server. Combined with imapfilter's client-side "push filtering", this allowed me to keep my existing mailserver architecture intact (outside the scope of this post) while gaining all the functionality I missed from a traditional fat mail client.

Recently my Facebook and Twitter Attention Span Syndrome (FaTASS) has peaked, motivating me to find creative solutions for managing the extra load. Growl is a very popular notification system that Mac OS X users have enjoyed for years. I've haven't found myself wanting for it before, mainly because I don't use an abundance of GUI apps for my daily tasks. And yet, Growl's unobtrusive nature and support for network events seemed the perfect fit.

Read the rest of this story...

Passing Fail

2009-09-29 14:19:01 by jdixon

I've heard all sorts of stereotypes about bad drivers. Usually they're racist or sexist (or both). And although it's politically incorrect to agree with them, there's almost always a sliver of truth hidden inside. But it's pretty rare to hear a specific criticism about an entire state of drivers (other than "they suck").

Maryland drivers are quickly gaining a reputation for cruising in the Passing Lane. This might sound like a minor gripe, but you have to consider that the entire state of Maryland is the size of a pimple on Virginia's forehead. Most of the highways are two- or three-lane affairs, not including interstate 95 or the Baltimore and DC perimeters. Monopolizing the Passing Lane can have a significant affect on the normal flow of traffic, and not just during rush hour.

Sometimes I try to consider what they might say if I confronted one of them. I bet they would argue that "I'm already going the speed limit in the fast lane, you're just trying to go faster. You're the unsafe one!". Therein lies the crux; it's not a fast lane, it's a Passing Lane (excessive and redundant emphasis mine). There's nothing subjective about the intended purpose of that lane. It's intended for passing slower traffic. It's not there as your personal safety zone, and it's certainly not yours to do with as you please. You're free to use it for the purposes of Passing vehicles. Once that's over, GTFO of my way.

That is all. Safe motoring, everyone.

Shooting a Barrelfish of Monkeys

2009-09-26 23:29:47 by jdixon

Stumbled across the Barrelfish project over at OSnews. The proof-of-concept Operating System appears to borrow concepts from distributed systems design. Rather than have a single kernel managing multiple cores, the Multicore kernel assumes no inter-core sharing and communicates with message passing. Presumably they've been able to overcome some of the traditional performance hits there.

I was particularly pleased to see their first relase distributed under a BSD-style license. Those crazy bastards at Microsoft, what's next... a Windows release that doesn't suck?

Mad Stylers in Demand

2009-09-20 23:04:53 by jdixon

Blogsum is quickly reaching the point where the focus is on style rather than substance. This is a good thing, of course; all of the core features envisioned for Blogsum are complete. If you've been paying attention at home you might have noticed that the directory layout has been tweaked a bit this weekend. I think these changes will make it much easier to support user modifications and third-party style templates.

The preference for Blogsum styling is to just modify the CSS stylesheet. However, users are also free to modify the images and HTML templates if they so desire. The structure is pretty straightforward:

/blogsum/themes/
               /$blog_theme/
                           /images/*.gif
                           /admin.tmpl
                           /index.tmpl
                           /style.css

The default theme is obviously contained in /blogsum/themes/default/ and shouldn't be modified. Copying the entire contents to a new theme directory is enough to get started. Make sure to set $blog_theme in your Config.pm. The only images currently included are used in the Admin view for managing articles.

P.S. There is now a Blogsum-users mailing list available for general questions and discussion about the project. If you happen to craft a new theme, please let us know!

Business Metrics

2009-09-15 23:58:42 by jdixon

Somewhere between our first corrupt filesystem and an unlikely ascent to CTO, all Systems Administrators are taught to monitor their systems. We're trained to monitor the health of our computers and trend the usage for capacity planning and analytics. A Nagios is deployed; eventually complemented by Cacti; both of which are inevitably supplanted by Something Enterprise (TM). Services are checked, change is managed, and reports are reportified.

Have you asked yourself, what value does this offer my company? Perhaps you've correlated your database connection breakdown time with website load time. Or you noticed that the FULL backups on Sunday coincide with excessive packet loss on your Seattle firewalls. Besides buffing out some of the rough edges on your operational capabilities, how does this data work for you?

Read the rest of this story...

Wings of Blurry

2009-09-11 10:06:38 by jdixon

This past Monday, the kids and I sat on the front porch watching bees buzzing into the purple trumpets of our hosta. We followed the long stems of the plant bow and bounce as the curious insects went about their work. In a fleeting moment, we heard an impossibly loud buzz coming from overhead and then shoot past us. A large hummingbird paused, directly above the hosta. It considered the plant for a moment, wings in full turbulence, then zipped away to its next destination.

I love how some of life's coolest moments are painfully brief. It leaves you wanting more.

Updates on Blogsum

2009-08-30 21:07:54 by jdixon

Minor features are still being added to Blogsum. It supports searching by author (effectively treating authors like tags) and the ability to disable comment submissions. There is also readmore support, allowing you to define a portion of an article that should only be seen in full "article view". You simply insert a <!--readmore--> tag where you'd like the "preview mode" to stop.

I'm also adding email notifications for comment submissions. This way you'll know the instant a new comment requires moderation. I should be done with this very soon. The last couple of items on my To-Do list are pagination and cleaning up the template usage. Once these are complete it should be ready for submission to the OpenBSD ports framework.

Update: Email notifications for comment submissions are complete.

OpenBSD as an LDAP Client

2009-08-27 22:33:50 by jdixon

OpenBSD's ypldap daemon provides YP maps using an LDAP backend. It was introduced with OpenBSD 4.4 but doesn't seem to have received much exposure within the community. I've been meaning to convert one of our bastion systems from using local accounts to LDAP, mainly for convenience.

The migration went smoothly except for the lack of a netid.byname mapping. Pierre-Yves Ritschard ([email protected]) told me this is high on his to-do list. Without this mapping, sudo is unable to getpwuid(). Therefore, any accounts requiring sudo rights (read: administrators) will need to remain as local accounts until this is resolved.

The vast majority of this write-up was taken almost verbatim from a similar posting at the Helion-Prime Solutions blog. I've filled in some missing bits with regards to the sudo issue as well as ypbind issues over non-broadcast segments.

Read the rest of this story...

Your Mom is Crazy

2009-08-24 11:37:37 by jdixon

When people ask you what you do for a living, do you answer "geek"? While shopping for a new car, is your primary criteria "good, fast, cheap... pick two"? Did you get goosebumps the first time you played with VMware's virtual switching/VLAN support? If so, you might be a perfect fit for our team.

OmniTI is looking for someone with real UNIX chops. We have a passion for what we do and it shows. A typical day in the Ops team is a heaping pile of scalability, smothered with resiliency, and a smattering of optimization. We eat and drink Open Source. We poop cold steel. You will be tempered, and you'll love every minute of it. If this sounds like your sort of thing, shoot me a line so we can talk.

P.S. We're the place your mom warned you about.

The Doctrine of Security

2009-08-21 23:22:36 by jdixon

Recently I had the opportunity to do an interview for a story on SMB security issues. The conversation reminded me just how easy it is, as a security professional, to paint everything in black and white. Hackers are good or evil. Software is secure or vulnerable. Vendors are responsible or stupid. But this really isn't how businesses operate.

The primary focus of most businesses is to engage in commerce. Often we overlook this basic fact when a company neglects to patch their systems and becomes a target. We argue that if the owner was serious about protecting his money, customers or data he would be more proactive. But do we have all the facts to make this judgment?

Every decision in business carries risks and rewards. Responsible patching seems like a no-brainer. Perhaps the company webserver is used for basic order submissions. No personal or private data is stored locally. Is it really harming anyone if the website gets defaced for a week until the owner's nephew stops by to reinstall it again? Certainly you could argue that the defacement reflects poorly on the business, but again we need to consider the risk vs reward scenario. If it costs less to leave a defaced server running than to call an after-hours professional, is that really a poor decision?

Don't get me wrong, this scenario would drive me nuts. And that's exactly why I'm a geek and not an accountant. On occasion we need to take our blinders off and consider the alternatives. Security is a process, not a moral standard.

Fore!

2009-08-19 12:24:19 by jdixon

Sometimes we all have difficult days. The alarm goes off at 5am for an early start. Traffic is a bitch. Hardware breaks, data corrupts, services lockup, drives fill up and servers crash. Co-workers disagree and people yell. The pager likes the sound of its own voice.

Once in a while, these days happen. We forge through them with a restless eye at the clock, waiting for it to be over. At the conclusion, can we look in the mirror and be proud of our efforts, or is there regret for the should'ves? It's hard to be passionate every day. When the cogs are aligned and the ship runs smoothly, passion stokes our fire and gives flight to new ideas. But a foul day can quickly drain our passion and result in poor judgment and apathy.

I used to play golf a lot. When I worked the graveyard shift, I routinely teed off at the end of my day. While most of the other golfers feared the dreaded sand-trap, I reveled in the opportunity. The chance to easily "save out" and focus on the next hole. Being able to meet these obstacles as opportunities adjusts our perception and can inspire us to greater heights.

Pros and Cons

2009-08-18 02:45:46 by jdixon

I've unofficially kicked off the pre-planning phase of DCBSDCon 2010 for tossing around ideas and informal preparations. If you're interested in becoming an event organizer (think carefully) or sponsor (spend graciously), I'd love to hear from you now. We'll be recruiting event volunteers as the New Year gets closer.

A lot of friends have been asking me when the event will happen. There's a strong possibility that it will be pushed back from February (where the F stands for effing cold) to April. DC weather is much more cooperative during the spring months (cherry blossoms anyone?).

Noit Grows Hair on Your Chest

2009-08-15 14:09:13 by jdixon

Todd Hoff over at High Scalability takes a look at Reconnoiter. He went through the [currently] arduous task of installing and configuring it manually; setting up checks can be a hairy experience. But the end result seems to justify the initial pain. It's a very exciting (and useful) application that will only get better as the #noit devs continue to hack on it.

As an Ops guy over at OmniTI, I've been fortunate to watch Reconnoiter's incubation process. Theo Schlossnagle is probably one of the smartest guys in this industry and he gets scalability issues. We've batted around ideas about network trend and analysis tools before (e.g. NFDB) so naturally I'm anxious to see where Noit takes us.

Shiny Objects and WTFs

2009-08-13 03:42:54 by jdixon

I've never claimed to be a prolific hacker. I take much longer to complete a simple piece of code than even your typical hobbyist programmer. I'm easily distracted by shiny objects and WTFs.

Nevertheless, I finally gave in and threw together something resembling a blogging app. There are no fancy features yet, and likely never will be. It currently does about 90% of what I want it to do, which is closer to 2% of what the typical blogging/CMS application is capable of. It's my own KISS approach with a healthy peppering of careful input handling and a simple SQLite backend.

If you've been looking for a small blog application, particularly one designed for running in OpenBSD's default httpd(8) chroot, then Blogsum might be good for you. If not, that's ok too. Let the next guy have his World Domination. I just want to blog some.

Introducing Blogsum

2009-08-10 19:16:13 by jdixon

This is an in-development version of blogsum. The goal is a simple, secure blogging application that doesn't come with useless knobs or hurdles.

The anti-wordpress.