Django signals for consistent caching

June 27, 2010 Tags: django, sql, caching

Because queries are much nicer when they're not executed.

This is something I've been playing with over the last few days. When I wrote my blog engine, I wasn't crazy about database optimization yet [1]. I had a look recently at the queries generated to render the home page and I noticed that I had a too many extra queries for each articles.

The Problem

Comments

I use contrib.comments and its set of template tags to generate the forms, render the comments and display the comment count for each blog entry on the homepage. For example:

{% get_comment_count for entry as comment_count %}

{{ comment_count }} comment{{ comment_count|pluralize }}.

This code issues a COUNT query. It's no big deal when there is a single entry but the homepage loops over a list of entries and the result is O(N) queries.

Tagging

In the same way, when you register a model with django-tagging, the tags are fetched only when they are accessed. For example:

{% with entry.tags as tags %}
  {% if tags %}Tags:
    {% for tag in tags %}
      <a href="{% url blog:tag tag %}">{{ tag }}</a>
    {% endfor %}
  {% endif %}
{% endwith %}

Again, this code executed for every listed entry will result in O(N) queries. Even worse, if I remove the {% with %} statement, the tags are fetched twice: once for the {% if tags %} check, and once for the {% for %} loop.

The solution

Tags and comments share the particularity not to change often (at least on my blog). And they're only attached to blog entries. So the idea is to add two columns and use them to cache the tags and the comment count in a persistent way. The key to keep the cached values in sync with your real data is to use Django signals.

Let's add some fields to our “Blog Entry” model. The tags will be stored as a string and separated by a comma, so we need a method to get the list of tags as a proper list.

class Entry(models.Model):
    # Your fields here...
    comment_count = models.PositiveIntegerField(_('Comment count'), default=0)
    cached_tags = models.CharField(_('Tags'), max_length=1023, blank=True)

    def get_cached_tags(self):
        if self.cached_tags:
            return self.cached_tags.split(',')
        return None

Then we need a function to update our comment count. This is a receiver function which we will connect to django signals.

def update_comment_count(sender, instance, created, **kwargs):
    count = Comment.objects.filter(site=settings.SITE_ID,
                                   object_pk=instance.object_pk,
                                   content_type=instance.content_type,
                                   is_public=True,
                                   is_removed=False).count()
    Entry.objects.filter(pk=instance.object_pk).update(comment_count=count)

Another function to update the cached tags:

def update_tags(sender, instance, created, **kwargs):
    ctype = ContentType.objects.get(app_label='blog', model='entry')
    tags = tagging.models.TaggedItem.objects.filter(content_type=ctype,
        object_id=instance.object.id).select_related().values_list('tag__name',
                                                                   flat=True)
    instance.object.cached_tags = ','.join(tags)
    instance.object.save()

To keep the cached attributes in sync with our data, we need to register the functions to update the values each time a comment is posted and each time an object is tagged. This is done by connecting the receiver functions to a post_save signal:

models.signals.post_save.connect(update_comment_count, sender=Comment)
models.signals.post_save.connect(update_tags, sender=TaggedItem)

Now, you can add tags and comments to your blog entries and the cached values will be automatically updated. Note however that if you need the values to be updated when a comment or a tag is deleted, you need to register the receiver functions to the post_delete signal (and get rid of the created argument). Also, a QuerySet.update() query doesn't send any signal at all so you may want to update the cache manually after doing such a query.

That's it! Down from 30 to 3 queries, from O(N) to O(1).


[1]At the time I was on shared hosting with plenty of resources available and a fast database server, and now I have my own VPS with a dozen of Django sites running on it. So optimization kinda makes sense now.

Comments

March 9, 2011poswald

You may want to consider adding 'dispatch_uid' to your connect call now. Basically if the file with the connect gets imported multiple times, the connect will be called multiple times as well. The dispatch uid prevents this. I find that always adding it means things will be a bit more reliable over time as imports get added.

Add a comment

Comments are closed for this entry.