29👍
If you are happy to choose one of the duplicates arbitrarily, I think the following might do the trick. Perhaps not the most efficient but simple enough and I guess you only need to run this once. Please verify this all works yourself on some test data in case I’ve done something silly, since you are about to delete a bunch of data.
First we find groups of objects which form duplicates. For each group, (arbitrarily) pick a “master” that we are going to keep. Our chosen method is to pick the one with lowest pk
from django.db.models import Min, Count
master_pks = MyModel.objects.values('A', 'B', 'C'
).annotate(Min('pk'), count=Count('pk')
).filter(count__gt=1
).values_list('pk__min', flat=True)
we then loop over each master, and delete all its duplicates
masters = MyModel.objects.in_bulk( list(master_pks) )
for master in masters.values():
MyModel.objects.filter(a=master.a, b=master.b, c=master.c
).exclude(pk=master.pk).del_ACCIDENT_PREVENTION_ete()
0👍
I want to add a slightly improved answer that will delete everything in a single query, instead of looping and deleting for each duplicate group. This will be much faster if you have a lot of records.
non_dupe_pks = list(
Model.objects.values('A', 'B', 'C')
.annotate(Min('pk'), count=Count('pk'))
.order_by()
.values_list('pk__min', flat=True)
)
dupes = Model.objects.exclude(pk__in=non_dupe_pks)
dupes.delete()
It’s important to add order_by()
in the first query otherwise the default ordering in the model might mess up with the aggregation.
You can comment out the last line and use dupes.count()
to check if the query is working as expected.
- How can I tell Django templates not to parse a block containing code that looks like template tags?
- Django update on queryset to change ID of ForeignKey
- Django order items by two fields, but ignoring them if they're zero