[Fixed]-Django. Thread safe update or create.


from django.db import IntegrityError

def update_or_create(model, filter_kwargs, update_kwargs)
    if not model.objects.filter(**filter_kwargs).update(**update_kwargs):
        kwargs = filter_kwargs.copy()
        except IntegrityError:
            if not model.objects.filter(**filter_kwargs).update(**update_kwargs):
                raise  # re-raise IntegrityError

I think, code provided in the question is not very demonstrative: who want to set id for model?
Lets assume we need this, and we have simultaneous operations:

def thread1():
    update_or_create(SomeModel, {'some_unique_field':1}, {'some_field': 1})

def thread2():
    update_or_create(SomeModel, {'some_unique_field':1}, {'some_field': 2})

With update_or_create function, depends on which thread comes first, object will be created and updated with no exception. This will be thread-safe, but obviously has little use: depends on race condition value of SomeModek.objects.get(some__unique_field=1).some_field could be 1 or 2.

Django provides F objects, so we can upgrade our code:

from django.db.models import F

def thread1():
                     {'some_field': F('some_field') + 1})

def thread2():
                     {'some_field': F('some_field') + 2})


You want django’s select_for_update() method (and a backend that supports row-level locking, such as PostgreSQL) in combination with manual transaction management.

    with transaction.commit_on_success():
        SomeModel.objects.create(pk=1, some_field=100)
except IntegrityError: #unique id already exists, so update instead
    with transaction.commit_on_success():
        object = SomeModel.objects.select_for_update().get(pk=1)

Note that if some other process deletes the object between the two queries, you’ll get a SomeModel.DoesNotExist exception.

Django 1.7 and above also has atomic operation support and a built-in update_or_create() method.



You can use Django’s built-in get_or_create, but that operates on the model itself, rather than a queryset.

You can use that like this:

me = SomeModel.objects.get_or_create(id=1)
me.some_field = 100

If you have multiple threads, your app will need to determine which instance of the model is correct. Usually what I do is refresh the model from the database, make changes, and then save it, so you don’t have a long time in a disconnected state.



It’s impossible in django do such upsert operation, with update. But queryset update method return number of filtered fields so you can do:

from django.db import router, connections, transaction

class MySuperManager(models.Manager):
     def _lock_table(self, lock='ACCESS EXCLUSIVE'):
         cursor = connections[router.db_for_write(self.model)]
            'LOCK TABLE %s IN %s MODE' % (self.model._meta.db_table, lock)

     def create_or_update(self, id, **update_fields): 
         with transaction.commit_on_success():            
             if not self.get_query_set().filter(id=id).update(**update_fields):
                self.model(id=id, **update_fields).save()

this example if for postgres, you can use it without sql code, but update or insert operation will not be atomic. If you create a lock on table you will be sure that two objects will be not created in two other threads.


I think if you have critical demands on atom operations. You’d better design it in database level instead of Django ORM level.

Django ORM system is focusing on convenience instead of performance and safety. You have to optimize the automatic generated SQL sometimes.

“Transaction” in most productive databases provide database lock and rollback well.

In mashup(hybrid) systems, or say your system added some 3rd part components, like logging, statistics. Application in different framework or even language may access database at the same time, adding thread safe in Django is not enough in this case.



Leave a comment