Appendix D. Testing Database Migrations

Django-migrations and its predecessor South have been around for ages, so it’s not usually necessary to test database migrations. But it just so happens that we’re introducing a dangerous type of migration, ie one that introduces a new integrity constraint on our data. When I first ran the migration script against staging, I saw an error.

On larger projects, where you have sensitive data, you may want the additional confidence that comes from testing your migrations in a safe environment before applying them to production data, so this toy example will hopefully be a useful rehearsal.

Another common reason to want to test migrations is for speed—migrations often involve downtime, and sometimes, when they’re applied to very large datasets, they can take time. It’s good to know in advance how long that might be.

An Attempted Deploy to Staging

Here’s what happened to me when I first tried to deploy our new validation constraints in Chapter 14:

$ cd deploy_tools
$ fab deploy:host=elspeth@superlists-staging.ottg.eu
[...]
Running migrations:
  Applying lists.0005_list_item_unique_together...Traceback (most recent call
last):
  File "/usr/local/lib/python3.3/dist-packages/django/db/backends/utils.py",
line 61, in execute
    return self.cursor.execute(sql, params)
  File
"/usr/local/lib/python3.3/dist-packages/django/db/backends/sqlite3/base.py",
line 475, in execute
    return Database.Cursor.execute(self, query, params)
sqlite3.IntegrityError: columns list_id, text are not unique
[...]

What happened was that some of the existing data in the database violated the integrity constraint, so the database was complaining when I tried to apply it.

In order to deal with this sort of problem, we’ll need to build a “data migration”. Let’s first set up a local environment to test against.

Running a Test Migration Locally

We’ll use a copy of the live database to test our migration against.

Warning

Be very, very, very careful when using real data for testing. For example, you may have real customer email addresses in there, and you don’t want to accidentally send them a bunch of test emails. Ask me how I know this.

Entering Problematic Data

Start a list with some duplicate items on your live site, as shown in Figure D-1.

This list has 3 identical items
Figure D-1. A list with duplicate items

Copying Test Data from the Live Site

Copy the database down from live:

$ scp elspeth@superlists.ottg.eu:\
/home/elspeth/sites/superlists.ottg.eu/database/db.sqlite3 .
$ mv ../database/db.sqlite3 ../database/db.sqlite3.bak
$ mv db.sqlite3 ../database/db.sqlite3

Confirming the Error

We now have a local database that has not been migrated, and that contains some problematic data. We should see an error if we try to run migrate:

$ python3 manage.py migrate --migrate
python3 manage.py migrate
Operations to perform:
[...]
Running migrations:
[...]
  Applying lists.0005_list_item_unique_together...Traceback (most recent call
last):
[...]
    return Database.Cursor.execute(self, query, params)
sqlite3.IntegrityError: columns list_id, text are not unique

Inserting a Data Migration

Data migrations are a special type of migration that modifies data in the database rather than changing the schema. We need to create one that will run before we apply the integrity constraint, to preventively remove any duplicates. Here’s how we can do that:

$ git rm lists/migrations/0005_list_item_unique_together.py
$ python3 manage.py makemigrations lists --empty
Migrations for 'lists':
  0005_auto_20140414_2325.py:
$ mv lists/migrations/0005_*.py lists/migrations/0005_remove_duplicates.py

Check out the Django docs on data migrations for more info, but here’s how we add some instructions to change existing data:

lists/migrations/0005_remove_duplicates.py

# encoding: utf8
from django.db import models, migrations

def find_dupes(apps, schema_editor):
    List = apps.get_model("lists", "List")
    for list_ in List.objects.all():
        items = list_.item_set.all()
        texts = set()
        for ix, item in enumerate(items):
            if item.text in texts:
                item.text = '{} ({})'.format(item.text, ix)
                item.save()
            texts.add(item.text)


class Migration(migrations.Migration):

    dependencies = [
        ('lists', '0004_item_list'),
    ]

    operations = [
        migrations.RunPython(find_dupes),
    ]

Re-creating the Old Migration

We re-create the old migration using makemigrations, which will ensure it is now the sixth migration and has an explicit dependency on 0005, the data migration:

$ python3 manage.py makemigrations
Migrations for 'lists':
  0006_auto_20140415_0018.py:
    - Alter unique_together for item (1 constraints)
$ mv lists/migrations/0006_* lists/migrations/0006_unique_together.py

Testing the New Migrations Together

We’re now ready to run our test against the live data:

$ cd deploy_tools
$ fab deploy:host=elspeth@superlists-staging.ottg.eu
[...]

We’ll need to restart the live Gunicorn job too:

elspeth@server:$ sudo restart gunicorn-superlists.ottg.eu

And we can now run our FTs against staging:

$ python3 manage.py test functional_tests --liveserver=superlists-staging.ottg.eu
Creating test database for alias 'default'...
....
 ---------------------------------------------------------------------
Ran 4 tests in 17.308s

OK

Everything seems in order! Let’s do it against live:

$ fab deploy --host=superlists.ottg.eu
[superlists.ottg.eu] Executing task 'deploy'
[...]

And that’s a wrap. git add lists/migrations, git commit, etc.

Conclusions

This exercise was primarily aimed at building a data migration and testing it against some real data. Inevitably, this is only a drop in the ocean of the possible testing you could do for a migration. You could imagine building automated tests to check that all your data was preserved, comparing the database contents before and after. You could write individual unit tests for the helper functions in a data migration. You could spend more time measuring the time taken for migrations, and experiment with ways to speed it up by, eg, breaking up migrations into more or fewer component steps.

Remember that this should be a relatively rare case. In my experience, I haven’t felt the need to test 99% of the migrations I’ve worked on. But, should you ever feel the need on your project, I hope you’ve found a few pointers here to get started with.

Get Test-Driven Development with Python now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.