[Solved]-How to avoid having idle connection timeout while uploading large file?

1👍

You can create an upload handler to upload file directly to s3. In this way you shouldn’t encounter connection timeout.

https://docs.djangoproject.com/en/1.10/ref/files/uploads/#writing-custom-upload-handlers

I did some tests and it works perfectly in my case.

You have to start a new multipart_upload with boto for example and send chunks progressively.

Don’t forget to validate the chunk size. 5Mb is the minimum if your file contains more than 1 part. (S3 Limitation)

I think this is the best alternative to django-queued-storage if you really want to upload directly to s3 and avoid connection timeout.

You’ll probably also need to create your own filefield to manage file correctly and not send it a second time.

The following example is with S3BotoStorage.

S3_MINIMUM_PART_SIZE = 5242880


class S3FileUploadHandler(FileUploadHandler):
    chunk_size = setting('S3_FILE_UPLOAD_HANDLER_BUFFER_SIZE', S3_MINIMUM_PART_SIZE)

    def __init__(self, request=None):
        super(S3FileUploadHandler, self).__init__(request)
        self.file = None
        self.part_num = 1
        self.last_chunk = None
        self.multipart_upload = None

    def new_file(self, field_name, file_name, content_type, content_length, charset=None, content_type_extra=None):
        super(S3FileUploadHandler, self).new_file(field_name, file_name, content_type, content_length, charset, content_type_extra)
        self.file_name = "{}_{}".format(uuid.uuid4(), file_name)

        default_storage.bucket.new_key(self.file_name)

        self.multipart_upload = default_storage.bucket.initiate_multipart_upload(self.file_name)

    def receive_data_chunk(self, raw_data, start):
        buffer_size = sys.getsizeof(raw_data)

        if self.last_chunk:
            file_part = self.last_chunk

            if buffer_size < S3_MINIMUM_PART_SIZE:
                file_part += raw_data
                self.last_chunk = None
            else:
                self.last_chunk = raw_data

            self.upload_part(part=file_part)
        else:
            self.last_chunk = raw_data

    def upload_part(self, part):
        self.multipart_upload.upload_part_from_file(
            fp=StringIO(part),
            part_num=self.part_num,
            size=sys.getsizeof(part)
        )
        self.part_num += 1

    def file_complete(self, file_size):
        if self.last_chunk:
            self.upload_part(part=self.last_chunk)

        self.multipart_upload.complete_upload()
        self.file = default_storage.open(self.file_name)
        self.file.original_filename = self.original_filename

        return self.file

3👍

I have faced the same issue and fixed it by using django-queued-storage on top of django-storages. What django queued storage does is that when a file is received it creates a celery task to upload it to the remote storage such as S3 and in mean time if file is accessed by anyone and it is not yet available on S3 it serves it from local file system. In this way you don’t have to wait for the file to be uploaded to S3 in order to send a response back to the client.

As your application behind Load Balancer you might want to use shared file system such as Amazon EFS in order to use the above approach.

1👍

You can try to skip uploading the file to your server and upload it to s3 directly, then only get back an url for your application.

There is an app for that: django-s3direct you can give it a try.

👤Todor

Leave a comment