2012-04-19 08:24:20 by jdixon
I love that Graphite can support per-second resolution. We've started to use it more frequently with applications that emit a constant stream of metrics to one of our aggregators. But there are times when an application might send updates less frequently, or when transient failures or network congestion result in lost metrics. In this case it makes sense to adjust your xFilesFactor value.
You may remember my last post that mentioned the whisper-info.py utility. It helps you extract metadata from your whisper files. Take for example, a whisper file for one of our collectd metrics:
$ sudo whisper-info.py /data/whisper/collectd/63694/swap/used.wsp maxRetention: 31536000 xFilesFactor: 0.5 aggregationMethod: average fileSize: 534580 Archive 0 retention: 86400 secondsPerPoint: 60 points: 1440 size: 17280 offset: 52 ...
Notice the default xFilesFactor value of 0.5. This represents the ratio of precision datapoints your archive must contain to be rolled up accurately. If it ever contained less than half actual precision measurements (e.g. more than half with nulls) then it would aggregate to a null archive. Because our collectd interval defaults to reporting every 10 seconds, this unlikely to ever become an issue.
However, for high-frequency (and potentially high-latency) metrics, we want to lower it to a sufficiently low ratio that null archives can be avoided. In the example below we've lowered xFilesFactor to ensure a valid rollup as long as a single datapoint is received.
$ sudo whisper-info.py /data/whisper/pulse/amqp-receives-per-second.wsp maxRetention: 31536000 xFilesFactor: 0.0 aggregationMethod: average fileSize: 538192 Archive 0 retention: 300 secondsPerPoint: 1 points: 300 size: 3600 offset: 64 ...
xFilesFactor can be defined in storage-aggregation.conf to set the default value for any new whisper files.
[pulse] pattern = ^pulse\. xfilesfactor = 0.0
If you need to adjust the xFilesFactor value for existing metrics, employ the whisper-resize.py tool:
$ sudo su -c "umask 0033; whisper-resize.py --xFilesFactor=0.0 --nobackup \ /data/whisper/pulse/amqp-publishes-per-second.wsp 1s:5m 1m:1d 5m:28d 15m:1y" carbon Retrieving all data from the archives Creating new whisper database: amqp-publishes-per-second.wsp.tmp Created: amqp-publishes-per-second.wsp.tmp (534580 bytes) Migrating data... Renaming old database to: amqp-publishes-per-second.wsp.bak Renaming new database to: amqp-publishes-per-second.wsp Unlinking backup: amqp-publishes-per-second.wsp.bak
Update/Correction: Michael Leinartas clued me into the fact that xFilesFactor should be configured in storage-aggregation.conf, not in storage-schemas.conf as I originally stated. This will eventually change (the configuration files are converging post-1.0-release), but this is valid as of Graphite 0.9.9.