From the very beginning at Rover, we’ve focused on making deploying code as fast and painlessly as possible. One important piece of our deployment infrastructure is Jenkins. As soon as we merge to master, (via a pull request) Jenkins runs our test suite—if the suite passes Jenkins automatically deploys the new version. As our app and our test suite have grown, these builds started taking longer than we’d like, so we decided to spend some time optimizing performance.
factory_boy is used very heavily in our test suite at Rover. In any given test, factories represent a meaningful portion of the total number of queries so improving factory_boy
performance should improve the performance of the entire test suite.
This article will focus on how we minimized database queries when creating two of our most common models: Person
and Dog
.
Measure First
To make measurement easy (and to replicate the environment our factories will see in the test suite), I first wrote a throwaway test case:
FACTORY_ITERATIONS = 1000
class FactoryCreateDurationTimingTests(TestCase):
def _evaluate_factory_speed(self, factory_cls, attr='create', **kwargs):
start = time.time()
for i in range(FACTORY_ITERATIONS):
getattr(factory_cls, attr)(**kwargs)
end = time.time()
duration_each = (end - start) / FACTORY_ITERATIONS
print "\t{}.{}() takes ~{}ms.".format(
factory_cls.__name__,
attr,
int(duration_each * 1000))
def test_create_person(self):
self._evaluate_factory_speed(PersonFactory)
def test_build_person(self):
self._evaluate_factory_speed(PersonFactory, attr='build')
def test_create_dog(self):
self._evaluate_factory_speed(DogFactory)
def test_build_dog(self):
self._evaluate_factory_speed(DogFactory, attr='build')
As the baseline, this test output:
DogFactory.create()
takes ~100ms.DogFactory.build()
takes ~0ms.PersonFactory.create()
takes ~100ms.PersonFactory.build()
takes ~0ms.
Identify Queries
The easiest way I found to track down the source of queries is to insert a in the pdb.set_trace();
execute
method of the appropriate backend.
This method lives in django/db/backends/(your database backend)/base.py
.
Using standard pdb
commands like w
, you can see the call stack for each query that’s executed.
The Primary Culprits
In our case, excess queries fell into 3 buckets:
- Queries caused by
factory_boy
- Queries caused by excessive work in
.save()
methods - Queries caused by signal handlers or other code higher up the
.save()
call chain
Queries caused by factory_boy
The _setup_next_sequence
method causes an additional query, but in our environment sequence value isn’t used, so we reimplemented the behavior of the parent class and return 0.
class YourFactory(DjangoModelFactory):
@classmethod
def _setup_next_sequence(cls):
return 0
Any post_generation
methods on a factory cause an additional .save()
call.
These methods allow you manipulate the record in question, but if the methods only create or change associated records, you don’t need that extra .save()
.
Removing those from the results
keyword argument to _after_postgeneration
eliminates the extra query:
class YourFactory(DjangoModelFactory):
@classmethod
def _after_postgeneration(cls, obj, create, results=None):
if results is not None:
results.pop('create_a_related_model')
super(YourFactory, cls).after_postgeneration(
obj,
create,
results)
@factory.post_generation
def create_a_related_model(self, **kwargs):
...
...
Queries caused by work in .save() methods
By convention on our team, in our code base any additional work that is done in.save()
methods can be disabled via flags passed to the .save()
method in question.
An example is worth 1000 words, so:
class YourModel(models.Model):
def save(self, do_something_expensive=True):
if do_something_expensive:
do_something_expensive()
super(YourModel, self).save()
class SaveFlagsDjangoModelFactory(factory.DjangoModelFactory):
"""
Allows a factory to handle accepting whitelisted kwargs to
the ._create method.
"""
@classmethod
def _create(cls, target_class, *args, **kwargs):
"""
Due to factory_boy passing through kwargs directly into the
get_or_create call (which will query with them) we reimplement
the functionality of the DjangoModelFactory class with slight tweaks.
"""
save_flags = cls._get_save_flag_kwargs()
save_flag_kwargs = {}
for flag, default in save_flags:
save_flag_kwargs[flag] = kwargs.pop(flag, default)
if cls.FACTORY_DJANGO_GET_OR_CREATE:
fields = cls.FACTORY_DJANGO_GET_OR_CREATE
filter_data = {}
for field in fields:
if field in kwargs:
filter_data[field] = kwargs[field]
try:
return cls.FACTORY_FOR.objects.get(**filter_data)
except cls.FACTORY_FOR.DoesNotExist:
pass
obj = cls.FACTORY_FOR(*args, **kwargs)
obj.save(**save_flag_kwargs)
return obj
@classmethod
def _after_postgeneration(cls, obj, create, results=None):
"""
Duplicate behavior of DjangoModelFactory _after_postgeneration
except to pass in the flags and defaults defined returned by
_get_save_flag_kwargs().
"""
if create and results:
kwargs = dict(cls._get_save_flag_kwargs())
obj.save(**kwargs)
class YourModelFactory(SaveFlagsDjangoModelFactory):
FACTORY_FOR = YourModel
@classmethod
def _get_save_flag_kwargs(cls):
return (
('do_something_expensive', False),
)
Queries caused by signals, etc.
These are best worked around by patching the associated behavior during your test runs, then exposing the ability to re-enable the behavior via a decorator for individual tests that rely on that behavior or are actually testing that behavior. For example:
from functool import wraps
from mock import patch
class Utility(object):
def something_skippable(self):
...
...
class BaseTestCase(TestCase):
def setUp(self):
self._something_skippable_patcher = patch.object(
Utility,
'something_skippable')
self._something_skippable_patcher.start()
super(BaseTestCase, self).setUp()
def tearDown(self):
self._something_skippable_patcher.stop()
super(BaseTestCase, self).tearDown()
def enable_something_skippable(func):
@wraps(func)
def wrapper(self, *args, **kwargs):
self._something_skippable_patcher.stop()
try:
result = func(self, *args, **kwargs)
finally:
self._something_skippable_patcher.start()
return result
return wrapper
class IndividualTestCase(BaseTestCase):
@enable_something_skippable
def test_where_we_need_something_skippable(self):
...
...
Results
Overall, we were able to massively speed up our factories.
DogFactory.create()
takes ~3ms.DogFactory.build()
takes ~0ms.PersonFactory.create()
takes ~1ms.PersonFactory.build()
takes ~0ms.
Improving the speed of our factories ultimately resulted in a 50% decrease in our overall test runtime – exactly the kind of improvement we’d hoped to see!