[Fixed]-Django: duplicates when filtering on many to many field

14👍

The 2 models in your example are represented with 3 tables: book, keyword and book_keyword relation table to manage M2M field.

When you use keywords__name in filter call Django is using SQL JOIN to merge all 3 tables. This allows you to filter objects in 1st table by values from another table.

The SQL will be like this:

SELECT `book`.`id`,
       `book`.`name`
FROM `book`
INNER JOIN `book_keyword` ON (`book`.`id` = `book_keyword`.`book_id`)
INNER JOIN `keyword` ON (`book_keyword`.`keyword_id` = `keyword`.`id`)
WHERE (`keyword`.`name` LIKE %fiction%)

After JOIN your data looks like

| Book Table          | Relation table                     | Keyword table                |
|---------------------|------------------------------------|------------------------------|
| Book ID | Book name | relation_book_id | relation_key_id | Keyword ID | Keyword name    |
|---------|-----------|------------------|-----------------|------------|-----------------|
| 1       | Book 1    | 1                | 1               | 1          | Science-fiction |
| 1       | Book 1    | 1                | 2               | 2          | Fiction         |
| 2       | Book 2    | 2                | 2               | 2          | Fiction         |

Then when data is loaded from DB into Python you only receive data from book table. As you can see the Book 1 is duplicated there

This is how Many-to-many relation and JOIN works

👤Igor

4👍

Direct quote from the Docs: https://docs.djangoproject.com/en/dev/topics/db/queries/#spanning-multi-valued-relationships

Successive filter() calls further restrict the
set of objects, but for multi-valued relations, they apply to any
object linked to the primary model, not necessarily those objects that
were selected by an earlier filter() call.

In your case, because keywords is a multi-valued relation, your chain of .filter() calls filters based only on the original model and not on the previous queryset.

Leave a comment