[Answer]-How to optimize this function for returning posts in a feed. Django


Your main concern is not something you need to worry about. Check the docs on When querysets are evaluated – you can define and add clauses to a queryset indefinitely and it won’t actually be run against the database until you call something that actually requires hitting the database.

What will require multiple queries is iterating through time until you hit a window that has posts. You’ll have better performance if you check the latest created time in one call, use that to work out your window, then limit the posts based on that and then sort by number of votes.

Something like:

unorderedPosts = Post.objects.all()
if request.user.is_authenticated(): 
    for preference in preferences: #filter by category preference
latest_post_datetime = unorderedPosts.aggregate(Max('created'))['created__max']

original_start_time = datetime.datetime.now() - datetime.timedelta(hours=startHours)    
latest_post_day_start_time = datetime.datetime.combine(latest_post_datetime.date(), original_start_time.time())
# a timedelta guaranteed to be less than 24 hours
time_shift = latest_post_day_start_time - latest_post_datetime
timewindow = datetime.timedelta(hours=4)
if time_shift.days >= 0:
    extra_windows_needed = time_shift.seconds / timewindow.seconds 
    # negative timedeltas store negative days, then positive seconds; negate
    extra_windows_needed = -(abs(time_shift).seconds) / timewindow.seconds
start_time = latest_post_day_start_time - (timewindow * (extra_windows_needed + 1))
posts = unorderedPosts.filter(created__gte=start_time).order_by('-upVote')
return posts

The math here is only right as long as your number of hours in your window (4) divides evenly into the day – otherwise calculating the correct offset gets trickier. Basically, you need to take time offset mod time window length, and I’m exploiting the fact that if you end up in the same calendar day I know the days mod four hours part works out.

Also, it doesn’t include an end time because your original logic doesn’t enforce one for the initial startHours period. It’ll only move the start time back out of that if there are none within it, so you don’t need to worry about stuff that’s too recent showing up.

This version makes at most three DB queries – one to get the category preferences for a logged in user, one to get latest_post_datetime and one to get posts, with confidence of having at least one matching post.

You might also consider profiling to see if your DB back end does better with a subquery to exclude unwanted categories:

if request.user.is_authenticated():
    unorderedPosts = unorderedPosts.exclude(category_name__in=request.user.categorypreference_set.filter(on=False).values_list('category_name')

As the __in lookup docs note, performance here varies with different database back ends.

Leave a comment