[Fixed]-CSV file upload from buffer to S3

25👍

Okay, disregard my earlier answer, I found the actual problem.

According to the boto3 documentation for the upload_fileobj function, the first parameter (Fileobj) needs to implement a read() method that returns bytes:

Fileobj (a file-like object) — A file-like object to upload. At a minimum, it must implement the read method, and must return bytes.

The read() function on a _io.StringIO object returns a string, not bytes. I would suggest swapping the StringIO object for a BytesIO object, adding in the necessary encoding and decoding.

Here is a minimal working example. It’s not the most efficient solution – the basic idea is to copy the contents over to a second BytesIO object.

import io
import boto3
import csv

buff = io.StringIO()

writer = csv.writer(buff, dialect='excel', delimiter=',')
writer.writerow(["a", "b", "c"])

buff2 = io.BytesIO(buff.getvalue().encode())

bucket = 'changeme'
key = 'blah.csv'

client = boto3.client('s3')
client.upload_fileobj(buff2, bucket, key)

5👍

As explained here using the method put_object rather than upload_fileobj would just do the job right with io.STRINGIO object buffer.

So here, to match the initial example:

client = boto3.client('s3')
client.upload_fileobj(buff2, bucket, key)

would become

client = boto3.client('s3')
client.put_object(Body=buff2, Bucket=bucket, Key=key, ContentType='application/vnd.ms-excel')

1👍

Have you tried calling buff.flush() first? It’s possible that your entirely-sensible debugging check (calling getvalue()) is creating the illusion that the buff has been written to, but isn’t if you don’t call it.

0👍

You can use something like goofys to redirect output to S3.

👤khc

Leave a comment