[Fixed]-Most elegant approach for writing JSON data to a relational database using Django Models?


In my opinion the cleanest place for the code you need is as a new Manager method (eg from_json_string) on a custom manager for the NinjaData model.

I don’t think you should override the standard create, get_or_create etc methods since you’re doing something a bit different from what they normally do and it’s good to keep them working normally.

I realised I’d probably want this for myself at some point so I have coded up and lightly tested a generic function. Since it recursively goes through and affects other models I’m no longer certain it belongs as a Manager method and should probably be a stand-alone helper function.

def create_or_update_and_get(model_class, data):
    get_or_create_kwargs = {
        model_class._meta.pk.name: data.pop(model_class._meta.pk.name)
        # get
        instance = model_class.objects.get(**get_or_create_kwargs)
    except model_class.DoesNotExist:
        # create
        instance = model_class(**get_or_create_kwargs)
    # update (or finish creating)
    for key,value in data.items():
        field = model_class._meta.get_field(key)
        if not field:
        if isinstance(field, models.ManyToManyField):
            # can't add m2m until parent is saved
        elif isinstance(field, models.ForeignKey) and hasattr(value, 'items'):
            rel_instance = create_or_update_and_get(field.rel.to, value)
            setattr(instance, key, rel_instance)
            setattr(instance, key, value)
    # now add the m2m relations
    for field in model_class._meta.many_to_many:
        if field.name in data and hasattr(data[field.name], 'append'):
            for obj in data[field.name]:
                rel_instance = create_or_update_and_get(field.rel.to, obj)
                getattr(instance, field.name).add(rel_instance)
    return instance

# for example:
from django.utils.simplejson import simplejson as json

data = json.loads(ninja_json)
ninja = create_or_update_and_get(NinjaData, data)


I don’t know if you’re familiar with the terminology, but what you’re basically trying to do is de-serialize from a serialized/string format (in this case, JSON) into Python model objects.

I’m not familiar with Python libraries for doing this with JSON, so I can’t recommend/endorse any, but a search using terms like “python”, “deserialization”, “json”, “object”, and “graph” seems to reveal some Django documentation for serialization and the library jsonpickle on github.


I’ve actually had this same need, and I wrote a custom database field to handle it. Just save the following in a Python module in your project (say, for instance, a fields.py file in the appropriate app), and then import and use it:

class JSONField(models.TextField):
    """Specialized text field that holds JSON in the database, which is
    represented within Python as (usually) a dictionary."""

    __metaclass__ = models.SubfieldBase

    def __init__(self, blank=True, default='{}', help_text='Specialized text field that holds JSON in the database, which is represented within Python as (usually) a dictionary.', *args, **kwargs):
        super(JSONField, self).__init__(*args, blank=blank, default=default, help_text=help_text, **kwargs)

    def get_prep_value(self, value):
        if type(value) in (str, unicode) and len(value) == 0:
            value = None
        return json.dumps(value)

    def formfield(self, form_class=JSONFormField, **kwargs):
        return super(JSONField, self).formfield(form_class=form_class, **kwargs)

    def bound_data(self, data, initial):
        return json.dumps(data)

    def to_python(self, value):
        # lists, dicts, ints, and booleans are clearly fine as is
        if type(value) not in (str, unicode):
            return value

        # empty strings were intended to be null
        if len(value) == 0:
            return None

        # NaN should become null; Python doesn't have a NaN value
        if value == 'NaN':
            return None

        # try to tell the difference between a "normal" string
        # and serialized JSON
        if value not in ('true', 'false', 'null') and (value[0] not in ('{', '[', '"') or value[-1] not in ('}', ']', '"')):
            return value

        # okay, this is a JSON-serialized string
        return json.loads(value)

A couple things. First, if you’re using South, you’ll need to explain to it how your custom field works:

from south.modelsinspector import add_introspection_rules
add_introspection_rules([], [r'^feedmagnet\.tools\.fields\.models\.JSONField'])

Second, while I’ve done a lot of work to make sure that this custom field plays nice everywhere, such as cleanly going back and forth between the serialized format and Python. There’s one place where it doesn’t quite work right, which is when using it in conjunction with manage.py dumpdata, where it coalesces the Python to a string rather than dumping it into JSON, which isn’t what you want. I’ve found this to be a minor problem in actual practice.

More documentation on writing custom model fields.

I assert that this is the single best and most obvious way to do this. Note that I also assume that you don’t need to do lookups on this data — e.g. you’ll retrieve records based on other criteria, and this will come along with it. If you need to do lookups based on something in your JSON, make sure that it’s a true SQL field (and make sure it’s indexed!).

Leave a comment