Unhelpful Graphite Tip #9 - xFilesFactor

2012-04-19 08:24:20 by jdixon

I love that Graphite can support per-second resolution. We've started to use it more frequently with applications that emit a constant stream of metrics to one of our aggregators. But there are times when an application might send updates less frequently, or when transient failures or network congestion result in lost metrics. In this case it makes sense to adjust your xFilesFactor value.

You may remember my last post that mentioned the whisper-info.py utility. It helps you extract metadata from your whisper files. Take for example, a whisper file for one of our collectd metrics:

$ sudo whisper-info.py /data/whisper/collectd/63694/swap/used.wsp

maxRetention: 31536000
xFilesFactor: 0.5
aggregationMethod: average
fileSize: 534580

Archive 0
retention: 86400
secondsPerPoint: 60
points: 1440
size: 17280
offset: 52


Notice the default xFilesFactor value of 0.5. This represents the ratio of precision datapoints your archive must contain to be rolled up accurately. If it ever contained less than half actual precision measurements (e.g. more than half with nulls) then it would aggregate to a null archive. Because our collectd interval defaults to reporting every 10 seconds, this unlikely to ever become an issue.

However, for high-frequency (and potentially high-latency) metrics, we want to lower it to a sufficiently low ratio that null archives can be avoided. In the example below we've lowered xFilesFactor to ensure a valid rollup as long as a single datapoint is received.

$ sudo whisper-info.py /data/whisper/pulse/amqp-receives-per-second.wsp

maxRetention: 31536000
xFilesFactor: 0.0
aggregationMethod: average
fileSize: 538192

Archive 0
retention: 300
secondsPerPoint: 1
points: 300
size: 3600
offset: 64


xFilesFactor can be defined in storage-aggregation.conf to set the default value for any new whisper files.

pattern = ^pulse\.
xfilesfactor = 0.0

If you need to adjust the xFilesFactor value for existing metrics, employ the whisper-resize.py tool:

$ sudo su -c "umask 0033; whisper-resize.py --xFilesFactor=0.0 --nobackup \
  /data/whisper/pulse/amqp-publishes-per-second.wsp 1s:5m 1m:1d 5m:28d 15m:1y" carbon

Retrieving all data from the archives
Creating new whisper database: amqp-publishes-per-second.wsp.tmp
Created: amqp-publishes-per-second.wsp.tmp (534580 bytes)
Migrating data...
Renaming old database to: amqp-publishes-per-second.wsp.bak
Renaming new database to: amqp-publishes-per-second.wsp
Unlinking backup: amqp-publishes-per-second.wsp.bak

Update/Correction: Michael Leinartas clued me into the fact that xFilesFactor should be configured in storage-aggregation.conf, not in storage-schemas.conf as I originally stated. This will eventually change (the configuration files are converging post-1.0-release), but this is valid as of Graphite 0.9.9.


at 2012-06-06 12:14:46, Bethany wrote in to say...

Thanks so much! This combined with your post about piping graphite json to spark helped me work through a blank-graphs-beyond-first-storage-retention-period issue.

at 2012-10-21 11:46:41, alfredo wrote in to say...

xfilesfactor was what I was looking for!. It allowed me to summarize metrics with a lot of null values!. But I have one question:

What the purpose of the "offset" that is show when your execute whisper-info.py??

at 2012-10-21 11:52:24, Jason Dixon wrote in to say...

@alfredo - The offset is the location in the whisper file where each archive starts.

at 2012-10-21 12:26:59, alfredo wrote in to say...

Thanks for your answer!. I have much to learn still

at 2013-06-14 05:46:25, Andji wrote in to say...

Sorry for posting in an old thread, but how does averaging function interpret nulls? I would like it to treat it as 0, so that with xFilesFactor = 0.0 I could have average number of events per time period.

Is that the case?

at 2013-06-14 09:14:42, Jason Dixon wrote in to say...

@Andji - Then you want to use the transformNull() function. http://graphite.readthedocs.org/en/0.9.10/functions.html#graphite.render.functions.transformNull

at 2014-12-01 18:38:47, Jameson Lopp wrote in to say...

For anyone who needs to migrate a bunch of whisper databases after making this change, here's a simple bash command you can execute from the root whisper directory:

find -name *.wsp -exec whisper-resize.py --xFilesFactor=0.0 --nobackup {} 1s:5m 1m:1d 5m:28d 15m:1y \;

at 2015-12-13 17:20:14, gevorg wrote in to say...

Can you explain why decreasing the xFilesFactor affects the disk io usage of carbon?

I noticed a jump from about 200 iops to about 1000 iops per host across a 4 node cluster. I saw the increase in iops immediately after updating the xFilesFactor across all whisper files. Went from 0.5 to 0.01 for averages and 0.5 to 0.0 for summations. The increase in iops was only for writes.

at 2015-12-28 12:34:42, Jason Dixon wrote in to say...

@gevorg It stands to reason that lowering your xFilesFactor would increase writes to some degree (although how much depends on your unique environment) since the inverse (a high xFilesFactor) would lead to decreased writes: null is the default state for any pre-allocated Whisper datapoint.

Add a comment:




max length 4000 chars