[Fixed]-Django: Distinct foreign keys

29👍

Queries don’t work like that – either in Django’s ORM or in the underlying SQL. If you want to get unique IDs, you can only query for the ID. So you’ll need to do two queries to get the actual Log entries. Something like:

id_list = Log.objects.order_by('-date').values_list('project_id').distinct()[:4]
entries = Log.objects.filter(id__in=id_list)

3👍

Actually, you can get the project_ids in SQL. Assuming that you want the unique project ids for the four projects with the latest log entries, the SQL would look like this:

SELECT project_id, max(log.date) as max_date
FROM logs
GROUP BY project_id
ORDER BY max_date DESC LIMIT 4;

Now, you actually want all of the log information. In PostgreSQL 8.4 and later you can use windowing functions, but that doesn’t work on other versions/databases, so I’ll do it the more complex way:

SELECT logs.*
FROM logs JOIN (
    SELECT project_id, max(log.date) as max_date
    FROM logs
    GROUP BY project_id
    ORDER BY max_date DESC LIMIT 4 ) as latest
ON logs.project_id = latest.project_id
   AND logs.date = latest.max_date;

Now, if you have access to windowing functions, it’s a bit neater (I think anyway), and certainly faster to execute:

SELECT * FROM (
   SELECT logs.field1, logs.field2, logs.field3, logs.date
       rank() over ( partition by project_id 
                     order by "date" DESC ) as dateorder
   FROM logs ) as logsort
WHERE dateorder = 1
ORDER BY logs.date DESC LIMIT 1;

OK, maybe it’s not easier to understand, but take my word for it, it runs worlds faster on a large database.

I’m not entirely sure how that translates to object syntax, though, or even if it does. Also, if you wanted to get other project data, you’d need to join against the projects table.

1👍

You need two querysets. The good thing is it still results in a single trip to the database (though there is a subquery involved).

latest_ids_per_project = Log.objects.values_list(
    'project').annotate(latest=Max('date')).order_by(
    '-latest').values_list('project')

log_objects = Log.objects.filter(
     id__in=latest_ids_per_project[:4]).order_by('-date')

This looks a bit convoluted, but it actually results in a surprisingly compact query:

SELECT "log"."id",
       "log"."project_id",
       "log"."msg"
       "log"."date"
FROM "log"
WHERE "log"."id" IN
    (SELECT U0."id"
     FROM "log" U0
     GROUP BY U0."project_id"
     ORDER BY MAX(U0."date") DESC
     LIMIT 4)
ORDER BY "log"."date" DESC

0👍

I know this is an old post, but in Django 2.0, I think you could just use:

Log.objects.values('project').distinct().order_by('project')[:4]
👤dchess

Leave a comment