Migrating Graphite from SQLite to PostgreSQL

2013-12-14 19:29:20 by jdixon

As mentioned in my previous article, I no longer recommend using SQLite as a Graphite backend for anything outside of development or testing work. It is too lenient with data types, and doesn't provide the levels of concurrency I'd like to see in an RDBMS for a production web service.

This opinion was cultivated almost exclusively from my recent experience migrating a single-node Graphite instance with an SQLite database to an HA pair of Graphite nodes with a shared PostgreSQL backend. For those of you considering migrating off SQLite to PostgreSQL, this article documents my initial struggles and eventual fixes for this transition.

First things first. I started by dumping the data from our existing SQLite database to a JSON file. We'll need this later on to import to our new database.

$ cd /opt/graphite/webapp/graphite/
$ python manage.py dumpdata > ~/initial_data.json

Next, we'll jump over to the PostgreSQL server and creating a new role user and password. Nothing exciting here.

[email protected]:~$ createuser -E -P graphite
Enter password for new role:
Enter it again:

[email protected]:~$ createdb -O graphite graphite

Back to the Graphite server, I installed the Python adapter for PostgreSQL (python-psycopg2 on Ubuntu) and configured the new database settings in local_settings.py.

DATABASES = {
  'default': {
    'NAME':     'graphite',
    'ENGINE':   'django.db.backends.postgresql_psycopg2',
    'USER':     'graphite',
    'PASSWORD': 'XXXXXXXXXXXXXXXX',
    'HOST':     'postgres-0.example.com',
    'PORT':     '5432'
  }
}

At this point we'd be ready to import our Django fixture, which contains all of the data from our old database. I've demonstrated this method previously as a way to bootstrap the Django fixture when automating Graphite servers. Unfortunately, this where I hit my first sign of trouble.

$ cd /opt/graphite/webapp/graphite
$ sudo cp ~/initial_data.json .
$ sudo python manage.py syncdb --noinput
Creating tables ...
Creating table account_profile
Creating table account_variable
Creating table account_view
Creating table account_window
Creating table account_mygraph
Creating table dashboard_dashboard_owners
Creating table dashboard_dashboard
Creating table events_event
Creating table auth_permission
Creating table auth_group_permissions
Creating table auth_group
Creating table auth_user_user_permissions
Creating table auth_user_groups
Creating table auth_user
Creating table auth_message
Creating table django_session
Creating table django_admin_log
Creating table django_content_type
Creating table tagging_tag
Creating table tagging_taggeditem
Installing custom SQL ...
Installing indexes ...
Problem installing fixture 'initial_data.json': Traceback (most recent call last):
  File "/usr/lib/python2.7/dist-packages/django/core/management/commands/loaddata.py", 
    line 174, in handle obj.save(using=using)
  File "/usr/lib/python2.7/dist-packages/django/core/serializers/base.py", line 165, 
    in save models.Model.save_base(self.object, using=using, raw=True)
  File "/usr/lib/python2.7/dist-packages/django/db/models/base.py", line 553, 
    in save_base result = manager._insert(values, return_id=update_pk, using=using)
  File "/usr/lib/python2.7/dist-packages/django/db/models/manager.py", line 198, 
    in _insert return insert_query(self.model, values, **kwargs)
  File "/usr/lib/python2.7/dist-packages/django/db/models/query.py", line 1475, 
    in insert_query return query.get_compiler(using=using).execute_sql(return_id)
  File "/usr/lib/python2.7/dist-packages/django/db/models/sql/compiler.py", line 791, 
    in execute_sql cursor = super(SQLInsertCompiler, self).execute_sql(None)
  File "/usr/lib/python2.7/dist-packages/django/db/models/sql/compiler.py", line 735, 
    in execute_sql cursor.execute(sql, params)
  File "/usr/lib/python2.7/dist-packages/django/db/backends/postgresql_psycopg2/base.py", 
    line 44, in execute return self.cursor.execute(query, args)
DatabaseError: value too long for type character varying(64)

At first I thought perhaps it was a schema bug with PostgreSQL, but that seems unlikely since PostgreSQL is largely the Django RDBMS of choice. Sure enough, I finally realized that the error message wasn't lying; we were hitting this because one or more graph (or database) names is too large. The invalid values were ignored by SQLite, because it doesn't enforce constraints. I queried SQLite to see how many graphs we had with excessively long names like that.

$ echo 'select name from account_mygraph;' | \
    sqlite3 /opt/graphite/storage/graphite.db | \
    awk '{ print length(), $0 | "sort -n" }' | tail -5

55 General_Study.Average_Open_Percentage_over_Days_of_Week
58 20130702_DDoS_IR_338_Answered_Queries_vs_Inbound_Bandwidth
58 brussard.EWR.StatsProcessing.60min.AvgProcessing0529To0625
61 cvonwallenstein.DynECT_Email_Delivery_Triage_Core_Performance
68 DynECT_Email_Delivery_Triage_User_Specific_WeTransfer_Reports_August

I tracked down the owner of the graph and asked them to rename or delete it. After the offending graph was deleted, I dropped the database and started again with a fresh initial_data.json. Sadly, I ran into another conflict.

...
...
Problem installing fixture 'initial_data.json': Traceback (most recent call last):
  File "/usr/lib/python2.7/dist-packages/django/core/management/commands/loaddata.py", 
    line 174, in handle obj.save(using=using)
  File "/usr/lib/python2.7/dist-packages/django/core/serializers/base.py", line 165, 
    in save models.Model.save_base(self.object, using=using, raw=True)
  File "/usr/lib/python2.7/dist-packages/django/db/models/base.py", line 526, 
    in save_base rows = manager.using(using).filter(pk=pk_val)._update(values)
  File "/usr/lib/python2.7/dist-packages/django/db/models/query.py", line 507, 
    in _update return query.get_compiler(self.db).execute_sql(None)
  File "/usr/lib/python2.7/dist-packages/django/db/models/sql/compiler.py", line 869, 
    in execute_sql cursor = super(SQLUpdateCompiler, self).execute_sql(result_type)
  File "/usr/lib/python2.7/dist-packages/django/db/models/sql/compiler.py", line 735, 
    in execute_sql cursor.execute(sql, params)
  File "/usr/lib/python2.7/dist-packages/django/db/backends/postgresql_psycopg2/base.py", 
    line 44, in execute return self.cursor.execute(query, args)
IntegrityError: duplicate key value violates unique constraint 
    "django_content_type_app_label_model_key"
DETAIL:  Key (app_label, model)=(contenttypes, contenttype) already exists.

After debugging the schema and fixture for a while, I discovered this thread on Stack Overflow, describing the same error and a potential workaround. Apparently when you run Django's syncdb, it pre-populates its contenttypes framework, which also exists in the fixture when you perform a dumpdata (this is a Django bug imho). I needed a way to run syncdb to create the tables and indices, but then "reset" the problem table before performing the inserts.

I attempted the migration again, this time running syncdb without the initial fixture. This completed successfully.

$ sudo rm ./initial_data.json
$ sudo python manage.py syncdb --noinput
Creating tables ...
Creating table account_profile
Creating table account_variable
Creating table account_view
Creating table account_window
Creating table account_mygraph
Creating table dashboard_dashboard_owners
Creating table dashboard_dashboard
Creating table events_event
Creating table auth_permission
Creating table auth_group_permissions
Creating table auth_group
Creating table auth_user_user_permissions
Creating table auth_user_groups
Creating table auth_user
Creating table auth_message
Creating table django_session
Creating table django_admin_log
Creating table django_content_type
Creating table tagging_tag
Creating table tagging_taggeditem
Installing custom SQL ...
Installing indexes ...
No fixtures found.

I then proceeded to TRUNCATE the django_content_type table.

$ sudo python manage.py dbshell
Password for user graphite:
psql (9.3.2, server 9.3.1)
SSL connection (cipher: DHE-RSA-AES256-SHA, bits: 256)
Type "help" for help.

graphite=> TRUNCATE django_content_type CASCADE;
NOTICE:  truncate cascades to table "auth_permission"
NOTICE:  truncate cascades to table "django_admin_log"
NOTICE:  truncate cascades to table "tagging_taggeditem"
NOTICE:  truncate cascades to table "auth_group_permissions"
NOTICE:  truncate cascades to table "auth_user_user_permissions"
TRUNCATE TABLE

I then performed a loaddata of the initial fixture, but I still hit the same error.

After a bit of fiddling, I finally got it working. Rather than truncating the table after the schema is loaded, I purged the offending contenttypes.contenttype model from the fixture and successfully ran syncdb on a fresh database.

$ sudo python manage.py syncdb --noinput
Creating tables ...
Creating table account_profile
Creating table account_variable
Creating table account_view
Creating table account_window
Creating table account_mygraph
Creating table dashboard_dashboard_owners
Creating table dashboard_dashboard
Creating table events_event
Creating table auth_permission
Creating table auth_group_permissions
Creating table auth_group
Creating table auth_user_user_permissions
Creating table auth_user_groups
Creating table auth_user
Creating table auth_message
Creating table django_session
Creating table django_admin_log
Creating table django_content_type
Creating table tagging_tag
Creating table tagging_taggeditem
Installing custom SQL ...
Installing indexes ...
Installed 1423 object(s) from 1 fixture(s)

The important takeaways from this experience are, again, to avoid running SQLite in production. If you have no choice, or are just now planning to migrate to PostgreSQL or MySQL based on my recommendations, be on the lookout for long graph names and be prepared to massage the fixture file to remove the conflicting contenttypes.contenttype model data.

Comments

at 2013-12-15 00:13:00, Engelbert wrote in to say...

$ echo 'select name from account_mygraph;' | \

sqlite3 /opt/graphite/storage/graphite.db | \

awk '{ print length(), $0 | "sort -n" }' | tail -5

Is SELECT length(name), name FROM account_mygraph ORDER BY length(name) DESC LIMIT 5 that difficult?

at 2013-12-15 08:15:29, Jason Dixon wrote in to say...

@Englebert - Nope, that's fine. Was the shell that difficult for you?

at 2013-12-15 10:28:30, Michael Jones wrote in to say...

Is there any reason not to modify models.py to increase the length beyond a measly 64 characters?

at 2014-04-07 16:54:18, Mark Crossfield wrote in to say...

Thought I’d leave a comment to thank you for this guide, and say that I really struggled with integrity errors importing the JSON export into Oracle. I think my problem went past contenttypes though. I’m not sure exactly what fixed it in the end (many schema purges later), but I found it much easier to leave out the whole contenttypes table in the export:

sudo ./manage.py dumpdata -e contenttypes > ~/dump-graphite-data-2014-03/initial_data.json.graphite001

Graphite & sqlite: never again.

Add a comment:

  name

  email

  url

max length 4000 chars