Speeding up Django test runs by optimizing Factory Boy

From the very beginning at Rover, we’ve focused on making deploying code as fast and painlessly as possible. One important piece of our deployment infrastructure is Jenkins. As soon as we merge to master, (via a pull request) Jenkins runs our test suite—if the suite passes Jenkins automatically deploys the new version. As our app and our test suite have grown, these builds started taking longer than we’d like, so we decided to spend some time optimizing performance.

factory_boy is used very heavily in our test suite at Rover. In any given test, factories represent a meaningful portion of the total number of queries so improving factory_boy performance should improve the performance of the entire test suite.

This article will focus on how we minimized database queries when creating two of our most common models: Person and Dog.

Measure First

To make measurement easy (and to replicate the environment our factories will see in the test suite), I first wrote a throwaway test case:

FACTORY_ITERATIONS = 1000 
class FactoryCreateDurationTimingTests(TestCase):
    def _evaluate_factory_speed(self, factory_cls, attr='create', **kwargs):    
        start = time.time()                                                     
        for i in range(FACTORY_ITERATIONS):                                     
            getattr(factory_cls, attr)(**kwargs)                                
        end = time.time()                                                       
        duration_each = (end - start) / FACTORY_ITERATIONS                      
        print "\t{}.{}() takes ~{}ms.".format(                                       
            factory_cls.__name__,
            attr,
            int(duration_each * 1000)) 

    def test_create_person(self):
        self._evaluate_factory_speed(PersonFactory)

    def test_build_person(self):
        self._evaluate_factory_speed(PersonFactory, attr='build')

    def test_create_dog(self):
        self._evaluate_factory_speed(DogFactory)

    def test_build_dog(self):
        self._evaluate_factory_speed(DogFactory, attr='build')

As the baseline, this test output:

DogFactory.create() takes ~100ms.
DogFactory.build() takes ~0ms.
PersonFactory.create() takes ~100ms.
PersonFactory.build() takes ~0ms.

Identify Queries

The easiest way I found to track down the source of queries is to insert a import pdb; pdb.set_trace(); in the execute method of the appropriate backend.

This method lives in django/db/backends/(your database backend)/base.py.

Using standard pdb commands like w, you can see the call stack for each query that’s executed.

The Primary Culprits

In our case, excess queries fell into 3 buckets:

Queries caused by factory_boy
Queries caused by excessive work in .save() methods
Queries caused by signal handlers or other code higher up the .save() call chain

Queries caused by factory_boy

The _setup_next_sequence method causes an additional query, but in our environment sequence value isn’t used, so we reimplemented the behavior of the parent class and return 0.

class YourFactory(DjangoModelFactory):
    @classmethod
    def _setup_next_sequence(cls):
        return 0

Any post_generation methods on a factory cause an additional .save() call.

These methods allow you manipulate the record in question, but if the methods only create or change associated records, you don’t need that extra .save().

Removing those from the results keyword argument to _after_postgeneration eliminates the extra query:

class YourFactory(DjangoModelFactory):
    @classmethod
    def _after_postgeneration(cls, obj, create, results=None):
        if results is not None:
            results.pop('create_a_related_model')
        super(YourFactory, cls).after_postgeneration(
            obj,
            create,
            results)

    @factory.post_generation
    def create_a_related_model(self, **kwargs):
        ...
        ...

Queries caused by work in .save() methods

By convention on our team, in our code base any additional work that is done in.save() methods can be disabled via flags passed to the .save() method in question.

An example is worth 1000 words, so:

class YourModel(models.Model):
    def save(self, do_something_expensive=True):
        if do_something_expensive:
            do_something_expensive()
        super(YourModel, self).save()


class SaveFlagsDjangoModelFactory(factory.DjangoModelFactory):
    """
    Allows a factory to handle accepting whitelisted kwargs to
    the ._create method.
    """
    @classmethod
    def _create(cls, target_class, *args, **kwargs):
        """
        Due to factory_boy passing through kwargs directly into the
        get_or_create call (which will query with them) we reimplement
        the functionality of the DjangoModelFactory class with slight tweaks.
        """
        save_flags = cls._get_save_flag_kwargs()
        save_flag_kwargs = {}
        for flag, default in save_flags:
            save_flag_kwargs[flag] = kwargs.pop(flag, default)

        if cls.FACTORY_DJANGO_GET_OR_CREATE:
            fields = cls.FACTORY_DJANGO_GET_OR_CREATE
            filter_data = {}
            for field in fields:
                if field in kwargs:
                    filter_data[field] = kwargs[field]
            try:
                return cls.FACTORY_FOR.objects.get(**filter_data)
            except cls.FACTORY_FOR.DoesNotExist:
                pass

        obj = cls.FACTORY_FOR(*args, **kwargs)
        obj.save(**save_flag_kwargs)
        return obj

    @classmethod
    def _after_postgeneration(cls, obj, create, results=None):
        """
        Duplicate behavior of DjangoModelFactory _after_postgeneration
        except to pass in the flags and defaults defined returned by
        _get_save_flag_kwargs().
        """
        if create and results:
            kwargs = dict(cls._get_save_flag_kwargs())
            obj.save(**kwargs)


class YourModelFactory(SaveFlagsDjangoModelFactory):
    FACTORY_FOR = YourModel

    @classmethod
    def _get_save_flag_kwargs(cls):
        return (
            ('do_something_expensive', False),
        )

Queries caused by signals, etc.

These are best worked around by patching the associated behavior during your test runs, then exposing the ability to re-enable the behavior via a decorator for individual tests that rely on that behavior or are actually testing that behavior. For example:

from functool import wraps
from mock import patch


class Utility(object):
    def something_skippable(self):
        ...
        ...


class BaseTestCase(TestCase):
    def setUp(self):
        self._something_skippable_patcher = patch.object(
            Utility,
            'something_skippable')
        self._something_skippable_patcher.start()
        super(BaseTestCase, self).setUp()

    def tearDown(self):
        self._something_skippable_patcher.stop()
        super(BaseTestCase, self).tearDown()


def enable_something_skippable(func):
    @wraps(func)
    def wrapper(self, *args, **kwargs):
        self._something_skippable_patcher.stop()
        try:
            result = func(self, *args, **kwargs)
        finally:
            self._something_skippable_patcher.start()
        return result
    return wrapper


class IndividualTestCase(BaseTestCase):
    @enable_something_skippable
    def test_where_we_need_something_skippable(self):
        ...
        ...

Results

Overall, we were able to massively speed up our factories.

DogFactory.create() takes ~3ms.
DogFactory.build() takes ~0ms.
PersonFactory.create() takes ~1ms.
PersonFactory.build() takes ~0ms.

Improving the speed of our factories ultimately resulted in a 50% decrease in our overall test runtime – exactly the kind of improvement we’d hoped to see!