If you use an array field
- The size of each row in your DB is going to be a bit large thus Postgres is going to be using a lot more toast tables (http://www.postgresql.org/docs/9.5/static/storage-toast.html)
- Every time you get the row, unless you specifically use
defer(https://docs.djangoproject.com/en/1.9/ref/models/querysets/#defer) the field or otherwise exclude it from the query via
valuesor something, you paying the cost of loading all those values every time you iterate across that row. If that’s what you need then so be it.
- Filtering based on values in that array, while possible isn’t going to be as nice and the Django ORM doesn’t make it as obvious as it does for M2M tables.
If you use M2M
- You can filter more easily on those related values
- Those fields are postponed by default, you can use
prefetch_relatedif you need them and then get fancy if you want only a subset of those values loaded
- Total storage in the DB is going to be slightly higher with M2M because of keys, and extra id fields
- The cost of the joins in this case is completely negligible because of keys.
Personally I’d say go with the M2M tables, but I don’t know your specific application. If you’re going to be working with a massive amount of data it’s likely worth grabbing a representative dataset and testing both methods with it.