Introducing django-perf-rec, our Django performance testing tool
During PyCon UK I had the opportunity to work on open-sourcing our in-house Django performance testing tool, which has
now been released as
django-perf-rec. We created it over two years ago, and have been using and improving it since.
It has been helping us to pre-emptively fix performance problems in our code, and now it can help you!
In the old days we used to often see performance regressions when we introduced a new feature to existing code, and
we'd have to retroactively understand and fix them. For example, we might add a feature that accesses a new foreign key
on a model, and because
prefetch_related hadn't been added to the appropriate
QuerySet, we'd see
an N+1 query problem appear. Often the problems would only manifest in real slowness in production, as our test and
development environments don't contain much data.
We tried to lock these down with Django's
assertNumQueries (docs) in tests like:
This worked on a basic level, but if the test failed on you, you'd be left with little information on what caused the failure, and be forced to manually trace the code path, thinking along the way about where the change came from.
Failures happened often enough for us that we started adding comments by
assertNumQueries to track roughly what the
expected queries were, to make retracing easier:
Suffice to say this wasn't fun. If a query was added or removed, you'd have to manually edit all the comments. Also they could still 'rot' and become inaccurate - if one query was removed whilst another was added elsewhere, the test would continue to pass but the comments would be out of sync, making debugging at the next failure harder.
We had the insight was that the comments actually contained data, and this data could be grabbed and written down by a tool automatically...
From this idea,
django-perf-rec was born. When active it intercepts all database queries (and also cache
operations!), and writes them out to a YAML file that lives next to the test. Then when the test runs again, it
compares the newly captured data to the record in the file, and fails if there are any differences. Thus the above test
can now be written as:
It also deals with variable data changing in your SQL and cache keys by fingerprinting it. For example the YAML for
the above test might look like the following - note the SQL parameters have been replaced with
#, and the column
When a failure happens, you get the exact comparison between the old and new lists, making it easy to understand why
a change has happened. (We're using
pytest which gives us nice output - PR's accepted for improving the output on
other test runners!). If the changes are acceptable you can just delete the YAML file and rerun the test to check them
in as part of the diff. If not, you have a lot more information to use in finding the problematic code.
It works with parallel test running (we use
pytest-xdist), so the YAML files don't get corrupted
whilst multiple processes write to them. Plus it also comes with a
TestCase mixin so you don't have to import it in
every file you want to use it in!
Check it out today at https://github.com/YPlan/django-perf-rec and if you need an improvement please open an issue, or better yet, a pull request!