[Solved]-Using StreamingHttpResponse with Django Rest Framework CSV

6đź‘Ť

A simpler solution, inspired by the @3066d0’s one:

renderers.py

class ReportsRenderer(CSVStreamingRenderer):
    header = [ ... ]
    labels = { ... }

views.py

class ReportCSVViewset(ListModelMixin, GenericViewSet):
    queryset = Report.objects.select_related('stuff')
    serializer_class = ReportCSVSerializer
    renderer_classes = [ReportsRenderer]
    PAGE_SIZE = 1000

    def list(self, request, *args, **kwargs):
        queryset = self.filter_queryset(self.get_queryset())
        response = StreamingHttpResponse(
            request.accepted_renderer.render(self._stream_serialized_data(queryset)),
            status=200,
            content_type="text/csv",
        )
        response["Content-Disposition"] = 'attachment; filename="reports.csv"'
        return response

    def _stream_serialized_data(self, queryset):
        serializer = self.get_serializer_class()
        paginator = Paginator(queryset, self.PAGE_SIZE)
        for page in paginator.page_range:
            yield from serializer(paginator.page(page).object_list, many=True).data

The point is that you need to pass a generator that yields serialized data as the data argument to the renderer, and then the CSVStreamingRenderer does its things and streams the response itself. I prefer this approach, because this way you do not need to override the code of a third-party library.

👤Andrii Vityk

2đź‘Ť

Django’s StreamingHttpResponse can be much slower than a traditional HttpResponse for small responses.

Don’t use it if you don’t need to; the Django Docs actually recommend that StreamingHttpResponse should only be used in when it is absolutely required that the whole content isn’t iterated before transferring the data to the client.”

Also for your problem you may find useful setting the chunk_size, switching to FileResponse or returning to a normal Response (if using the REST framework) or HttpResponse.

Edit 1: About setting the chunk size:

In the File api you can open the File in chunks so not all the file gets loaded in memory.

I hope you find this useful.

👤ascoder

0đź‘Ť

So I ended up coming to a solution I was happy with using the Paginator class with the queryset. First, I wrote a renderer that subclassed the CSVStreamingRenderer, then used that in my CSVViewset’s Renderer.

renderers.py

from rest_framework_csv.renderers import CSVStreamingRenderer

# *****************************************************************************
# BatchedCSVRenderer
# *****************************************************************************


class BatchedCSVRenderer(CSVStreamingRenderer):

    """
    a CSV renderer that works with large querysets returning a generator
    function. Used with a streaming HTTP response, it provides response bytes
    instead of the client waiting for a long period of time
    """

    def render(self, data, renderer_context={}, *args, **kwargs):
        if 'queryset' not in data:
            return data

        csv_buffer = Echo()
        csv_writer = csv.writer(csv_buffer)

        queryset = data['queryset']
        serializer = data['serializer']

        paginator = Paginator(queryset, 50)

        #  rendering the header or label field was taken from the tablize
        #  method in django rest framework csv

        header = renderer_context.get('header', self.header)
        labels = renderer_context.get('labels', self.labels)

        if labels:
            yield csv_writer.writerow([labels.get(x, x) for x in header])
        else:
            yield csv_writer.writerow(header)

        for page in paginator.page_range:
            serialized = serializer(
                paginator.page(page).object_list, many=True
            ).data

            #  we use the tablize function on the parent class to get a
            #  generator that we can use to yield a row

            table = self.tablize(
                serialized,
                header=header,
                labels=labels,
            )

            #  we want to remove the header from the tablized data so we use
            #  islice to take from 1 to the end of generator

            for row in itertools.islice(table, 1, None):
                yield csv_writer.writerow(row)

# *****************************************************************************
# ReportsRenderer
# *****************************************************************************


class ReportsRenderer(BatchedCSVRenderer):

    """
    A render for returning CSV data for reports

    """

    header = [ ... ]
    labels = { ... }

views.py

from django.http import StreamingHttpResponse
from rest_framework import mixins, viewsets

# *****************************************************************************
# CSVViewSet
# *****************************************************************************


class CSVViewSet(
        mixins.ListModelMixin,
        viewsets.GenericViewSet,
):

    def list(self, request, *args, **kwargs):
        queryset = self.get_queryset()

        return StreamingHttpResponse(
            request.accepted_renderer.render({
                'queryset': queryset,
                'serializer': self.get_serializer_class(),
            })
)

# *****************************************************************************
# ReportsViewset
# *****************************************************************************


class ReportCSVViewset(CSVViewSet):

    """
    Viewset for report CSV output

    """

    renderer_classes = [ReportCSVRenderer]
    serializer_class = serializers.ReportCSVSerializer

    def get_queryset(self):
        queryset = Report.objects.filter(...)

This might seem like a lot for a streaming response, but we used the BatchedCSVRender and CSVViewset in a bunch of other places. If you’re running your server behind nginx then it might also be useful to adjust the settings there to allow streaming responses.

Hopefully this helps anyone having the same goal. Let me know if there’s any other information I can provide.

👤3066d0

0đź‘Ť

You need to provide the CSV headers (via the header param) when rendering the data:

renderer.render(data, renderer_context={'header': ['header1', 'header2', 'header3']})

If you don’t specify the header parameter, djangorestframework-csv will attempt to “guess” the CSV headers by itself. To “guess” the CSV headers, djangorestframework-csv will load all your data in memory, resulting in the delay you are experiencing.

👤spg

Leave a comment