[Fixed]-Django Python Garbage Collection woes

8👍

We did something like this for gunicorn. Depending on what wsgi server you use, you need to find the right hooks for AFTER the response, not before. Django has a request_finished signal but that signal is still pre response.

For gunicorn, in the config you need to define 2 methods like so:

def pre_request(worker, req):
    # disable gc until end of request
    gc.disable()


def post_request(worker, req, environ, resp):
    # enable gc after a request
    gc.enable()

The post_request here runs after the http response has been delivered, and so is a very good time for garbage collection.

👤dalore

4👍

I believe one option would be to completely disable garbage collection and then manually collect at the end of a request as suggested here: How does the Garbage Collection mechanism work?

I imagine that you could disable the GC in your settings.py file.

If you want to run GarbageCollection on every request I would suggest developing some Middleware that does it in the process response method:

import gc
class GCMiddleware(object):
    def process_response(self, request, response):
        gc.collect()
        return response

1👍

An alternative might be to disable GC altogether, and configure mod_wsgi (or whatever you’re using) to kill and restart processes more frequently.

0👍

My view ends with return HttpResponse(), AFTER which I would like to run a gen 2 GC sweep.

// turn off GC
// do stuff
resp = HttpResponse()
// turn on GC
return resp

I’m not sure, but instead of //turn on GC you might be able to // spawn thread to turn on GC in 0.1 sec.

In order to make sure that GC doesn’t happen until after the request is processed, if the thread spawning doesn’t work, you would need to modify django itself or use some sort of django hook, as dcurtis suggested.

If you’re dealing with performance-critical code, you might also want to consider using a manual memory management language like C/C++ for that part, and using Python simply to invoke/query it.

0👍

Building on the approach from @milkypostman you can use gevent. You want one call to garbage collection per request but the problem with the @milkypostman suggestion is that the call to gc.collect() will still block the returning of the request. Gevent lets us return immediately and have the GC run proceed after the process is returned from.

First in your wsgi file be sure to monkey patch all with gevent magic stuff and disable garbage collection. You can set gc.disable() but some libraries have context managers that turn it on after disabling it (messagepack for instance), so the 0 threshold is more sticky.

import gc
from gevent import monkey

# Disable garbage collection runs
gc.set_threshold(0)
# Apply gevent monkey magic
monkey.patch_all()

Then create some middleware for Django like this:

from gc import collect
import gevent

class BaseMiddleware:

    def __init__(self, get_response):
        self.get_response = get_response


class GcCollectMiddleware(BaseMiddleware):
    """Middleware which performs a non-blocking gc.collect()"""

    def __call__(self, request):
        response = self.get_response(request)
        gevent.spawn(collect)
        return response

You’ll see the main difference here vs the previously suggested approach is that gc.collect() is wrapped in gevent.spawn which will not block returning the HttpResponse and your users will get a snappier response!

Leave a comment