[Solved]-How to make Django Queryset that selects records with max value within a group

5πŸ‘

This Will help you

from django.db.models import Count, Max
MyClass.objects.values('my_integer').annotate(count=Count("my_integer"),latest_date=Max('created_ts'))

Data in table

  my_integer      created_ts
    -             -----------
    1 2015-09-08 20:05:51.144321+00:00
    1 2015-09-08 20:08:40.687936+00:00
    3 2015-09-08 20:08:58.472077+00:00
    2 2015-09-08 20:09:08.493748+00:00
    2 2015-09-08 20:10:20.906069+00:00

Output

[
    {'count': 2, 'latest_date': datetime.datetime(2015, 9, 8, 20, 8, 40, 687936, tzinfo=<UTC>), 'my_integer': 1},
    {'count': 2, 'latest_date': datetime.datetime(2015, 9, 8, 20, 10, 20, 906069, tzinfo=<UTC>), 'my_integer': 2},
    {'count': 1, 'latest_date': datetime.datetime(2015, 9, 8, 20, 8, 58, 472077, tzinfo=<UTC>), 'my_integer': 3}
]
πŸ‘€iammehrabalam

0πŸ‘

Try this;

from django.db.models import Max

MyClass.objects.values('my_integer').annotate(Max('created_ts'))
πŸ‘€Geo Jacob

0πŸ‘

This here is a really basic way to do it. Assuming the amount of data you have isn’t super large, this will work decently. You can use this in your views by overriding the get_queryset function and just returning filtered. Or you can use it as a static method on your class if you plan on using it everywhere.

values = MyClass.objects.order_by('-created_ts').all()

filtered = []
existing = []
for value in values:
    if value.my_integer not in existing:
        existing.append(value.my_integer) 
        filtered.append(value)  

Since the list is ordered by the most recent first they will get added to the existing first for that integer. I did some basic testing with it, but not much so there may be a flaw or two in there. Tested with sqlite.

Edit

Here is a much faster version.

def iter_tools():
    import itertools
    qs = MyClass.objects.all()
    filtered = []
    group_by = itertools.groupby(qs, lambda x: x.my_integer)
    for x in group_by:
        filtered.append(sorted(x[1], key=lambda x: x.created_ts, reverse=True)[0])
    return filtered

Essentially the way this is working is getting all of your objects from your db, grouping them by the integer, then sorting each group based on the timestamp and getting just the first one from each group. Speeding this even more is beyond my skills but I am sure there is some ways.

Here is the timeit of this one vs the one earlier with only like 6 entries in the db:

In[]: timeit.timeit(manual, number=1500)
Out[]: 0.5577559471130371
In[]: timeit.timeit(iter_tools, number=1500)
Out[]: 0.39012885093688965
-----------------------------------------------
In[]: timeit.timeit(manual, number=5000)
Out[]: 1.770777940750122
In[]: timeit.timeit(iter_tools, number=5000)
Out[]: 1.2411231994628906

Edit 2:
I created 60000 objects to the database to try it out with some data. I generated the data with django-fixtureless so the integers are completely random and the timestamp on all of them is a new datetime.now() for each object.

In[]: timeit.timeit(manual, number=1)
Out[]: 11.946185827255249
In[]: timeit.timeit(iter_tools, number=1)
Out[]: 0.7811920642852783
In[]: timeit.timeit(iter_tools, number=100)
Out[]: 77.93837308883667
In[]: MyClass.objects.all().count()
Out[]: 60000

A note about the DB: In the above examples I was using sqlite3 just on my local machine. I just now setup a quick little mysql server as a vm and received a much better speed result.

In[16]: MyClass.objects.all().count()
Out[16]: 60000
In[17]: timeit.timeit(iter_tools, number=100)
Out[17]: 49.636733055114746
In[18]: timeit.timeit(iter_tools, number=1)
Out[18]: 0.4923059940338135

Either way, you get the same objects returned. If performance is an issue I would recommend either using the itertools one or a custom sql query.

πŸ‘€Jared Mackey

0πŸ‘

You can either do a raw query:

MyClass.objects.raw("""
SELECT m1.id, m1.my_integer, m1.created_ts
FROM app_myclass AS m1, (
    SELECT my_integer, MAX(created_ts) AS max_created_ts
    FROM app_myclass
    GROUP BY my_integer
) AS m2
WHERE m1.my_integer = m2.my_integer AND m1.created_ts = m2.max_created_ts
"""))

Or use the Django ORM:

MyClass.objects.filter(
    created_ts__in=MyClass.objects.values(
        "my_integer"
    ).annotate(
        created_ts=models.Max(
            "created_ts"
        )
    ).values_list("created_ts", flat=True)
)

Note that this requires only a single SQL request, as you can see by printing len(django.db.connection.queries) before and after the query.

However, note that the latter solution only works if your created_ts attribute is guaranteed to be unique, which might not be your case.

If you’re not willing to use raw queries or an index on created_ts, then you should probably start using PostgreSQL and its DISTINCT ON feature, as suggested by other answers.

πŸ‘€RΓ©gis B.

-1πŸ‘

untested

results = MyClass.objects.all().distinct('my_integer').order_by('created_ts')
πŸ‘€Cody Bouche

-1πŸ‘

MyClass.objects.order_by('my_integer', '-created_ts').distinct('my_integer')

According to distinct, you need to call distinct on attributes, in the same order as in order_by. Hence order the elements based on integer and then in reverse timestamp, and call distinct on them, which returns the latest instance for every integer.

Leave a comment