[Fixed]-Django REST framework Group by fields and add extra contents

8👍

I explain it more generally on the graph of the database model. It can be applied to any “GROUP BY” with an extra contents.

          +-------------------------+
          | MovieTicket (booked_at) |
          +-----+--------------+----+
                |              |
      +---------+--------+  +--+---+
      |    Show (time)   |  | User |
      ++----------------++  +------+
       |                |
+------+-------+  +-----+------+
| Movie (name) |  | Day (date) |
+--------------+  +------------+

The question is: How to summarize MovieTicket (the topmost object) grouped by Show (one related object) filtered by User (other related object) with reporting details from some related deeper objects (Movie and Day) and sorting these results by some field aggregated from the topmost model by the group (by the booked time of the recent MovieTicket in the group):

Answer explained by more general steps:

  • Start with the topmost model:
    (MovieTicket.objects ...)
  • Apply filters:
    .filter(user=user)
  • It is important to group by pk of the nearest related models (at least models those which are not made constant by the filter) – It is only “Show” (because “User” object is still filtered to one user)
    .values('show_id')
    Even if all other fields would be unique together (show__movie__name, show__day__date, show__time) it is better for the database engine optimizer to group the query by show_id because all these other fields depend on show_id and can not impact the number of groups.
  • Annotate necessary aggregation functions:
    .annotate(total_tickets=Count('show'), last_booking=Max('booked_at'))
  • Add required dependent fields:
    .values('show_id', 'show__movie__name', 'show__day__date', 'show__time')
  • Sort what is necessary:
    .order_by('-last_booking') (descending from the latest to the oldest)
    It is very important to not output or sort any field of the topmost model without encapsulating it by aggregation function. (Min and Max functions are good for sampling something from a group. Every field not encapsulated by aggregation would be added to “group by” list and that will break intended groups. More tickets to the same show for friend could be booked gradually but should be counted together and reported by the latest booking.)

Put it together:

from django.db.models import Max

qs = (MovieTicket.objects
      .filter(user=user)
      .values('show_id', 'show__movie__name', 'show__day__date', 'show__time')
      .annotate(total_tickets=Count('show'), last_booking=Max('booked_at'))
      .order_by('-last_booking')
      )

The queryset can be easily converted to JSON how demonstrated zaphod100.10 in his answer, or directly for people not interested in django-rest framework this way:

from collections import OrderedDict
import json

print(json.dumps([
    OrderedDict(
        ('show', x['show_id']),
        ('movie', x['show__movie__name']),
        ('time', x['show__time']),      # add time formatting
        ('day': x['show__day__date']),  # add date formatting
        ('total_tickets', x['total_tickets']),
        # field 'last_booking' is unused
    ) for x in qs
]))

Verify the query:

>>> print(str(qs.query))
SELECT app_movieticket.show_id, app_movie.name, app_day.date, app_show.time,
    COUNT(app_movieticket.show_id) AS total_tickets,
    MAX(app_movieticket.booked_at) AS last_booking
FROM app_movieticket
INNER JOIN app_show ON (app_movieticket.show_id = app_show.id)
INNER JOIN app_movie ON (app_show.movie_id = app_movie.id)
INNER JOIN app_day ON (app_show.day_id = app_day.id)
WHERE app_movieticket.user_id = 23
GROUP BY app_movieticket.show_id, app_movie.name, app_day.date, app_show.time
ORDER BY last_booking DESC

Notes:

  • The graph of models is similar to ManyToMany relationship, but MovieTickets are individual objects and probably hold seat numbers.

  • It would be easy to get a similar report for more users by one query. The field ‘user_id’ and the name would be added to “values(…)”.

  • The related model Day is not intuitive, but it is clear that is has a field date and hopefully also some non trivial fields, maybe important for scheduling shows with respect to events like cinema holidays. It would be useful to set the field ‘date’ as the primary key of Day model and spare a relationship lookup frequently in many queries like this.

(All important parts of this answer could be found in the oldest two answers: Todor and zaphod100.10. Unfortunately these answers have not been combined together and then not up-voted by anyone except me, even that the question has many up-votes.)

3👍

I would like to filter MovieTicket with its user field and group them
according to its show field, and order them by the recent booked time.

This queryset will give you exactly what you want:

tickets = (MovieTicket.objects
            .filter(user=request.user)
            .values('show')
            .annotate(last_booking=Max('booked_at'))
            .order_by('-last_booking')
)

And respond back with json data using Django rest framework like this:
[
{
show: 4,
movie: “Lion king”,
time: “07:00 pm”,
day: “23 Apr 2017”,
total_tickets = 2
},
{
show: 7,
movie: “Gone girl”,
time: “02:30 pm”,
day: “23 Apr 2017”,
total_tickets = 1
}
]

Well this json data is not the same as the query you described. You can add total_tickets by extending the annotation and show__movie__name into the .values clause: this will change the grouping to show+movie_name, but since show only has one movie_name it wont matter.

However, you cannot add show__day__date and show__time, because one show have multiple date-times, so which one would you want from a group? You could for example fetch the maximum day and time but this does not guarantee you that at this day+time there will be a show, because these are different fields, not related by each other. So the final attempt may look like:

tickets = (MovieTicket.objects
            .filter(user=request.user)
            .values('show', 'show__movie__name')
            .annotate(
                last_booking=Max('booked_at'),
                total_tickets=Count('pk'),
                last_day=Max('show__day'),
                last_time=Max('show__time'),
            )
            .order_by('-last_booking')
)
👤Todor

1👍

You have to group by show and then count the total number of movie tickets.

MovieTicket.objects.filter(user=23).values('show').annotate(total_tickets=Count('show')).values('show', 'total_tickets', 'show__movie__name', 'show__time', 'show__day__date'))

Use this serilizer class for the above queryset. It will give the required json output.

class MySerializer(serializers.Serializer):
    show = serailizer.IntegerField()
    movie = serializer.StringField(source='show__movie__name')
    time = serializer.TimeField(source='show__time')
    day = serializer.DateField(source='show__day__date')
    total_tickets = serializer.IntegerField()

It is not possible to order_by booked_at since that information gets lost when we group by show. If we order by booked_at group by will happen on unique booked_at times and show ids and that is why the ticket count was coming 1. Without order_by you will get correct count.

EDIT:

use this query:

queryset = (MovieTicket.objects.filter(user=23)
            .order_by('booked_at').values('show')
            .annotate(total_tickets=Count('show'))
            .values('show', 'total_tickets', 'show__movie__name',
                    'show__time', 'show__day__date')))

You cannot annotate on an annotated field. So you will to find the total tickets count in python. To calculate total_tickets count for unique show ids:

tickets = {}
for obj in queryset:
    if obj['show'] not in tickets.keys():
        tickets[obj['show']] = obj
    else:
        tickets[obj['show']]['total_tickets'] += obj['total_tickets']

the final list of objects you need is tickets.values()

The same serializer above can be used with these objects.

1👍

You can try this.

Show.objects.filter(movieticket_sets__user=23).values('id').annotate(total_tickets=Count('movieticket_set__user')).values('movie__name', 'time', 'day').distinct()

OR

Show.objects.filter(movieticket_sets__user=23).values('id').annotate(total_tickets=Count('id')).values('movie__name', 'time', 'day').distinct()
👤Ashish

Leave a comment