[Fixed]-Migrate Django model to unique_together constraint

29👍

If you are happy to choose one of the duplicates arbitrarily, I think the following might do the trick. Perhaps not the most efficient but simple enough and I guess you only need to run this once. Please verify this all works yourself on some test data in case I’ve done something silly, since you are about to delete a bunch of data.

First we find groups of objects which form duplicates. For each group, (arbitrarily) pick a “master” that we are going to keep. Our chosen method is to pick the one with lowest pk

from django.db.models import Min, Count

master_pks = MyModel.objects.values('A', 'B', 'C'
    ).annotate(Min('pk'), count=Count('pk')
    ).filter(count__gt=1
    ).values_list('pk__min', flat=True)

we then loop over each master, and delete all its duplicates

masters = MyModel.objects.in_bulk( list(master_pks) )

for master in masters.values():
    MyModel.objects.filter(a=master.a, b=master.b, c=master.c
        ).exclude(pk=master.pk).del_ACCIDENT_PREVENTION_ete()
👤second

0👍

I want to add a slightly improved answer that will delete everything in a single query, instead of looping and deleting for each duplicate group. This will be much faster if you have a lot of records.

non_dupe_pks = list(
    Model.objects.values('A', 'B', 'C')
    .annotate(Min('pk'), count=Count('pk'))
    .order_by()
    .values_list('pk__min', flat=True)
)

dupes = Model.objects.exclude(pk__in=non_dupe_pks)
dupes.delete()

It’s important to add order_by() in the first query otherwise the default ordering in the model might mess up with the aggregation.

You can comment out the last line and use dupes.count() to check if the query is working as expected.

👤Misao

Leave a comment