[Solved]-Is it possible to use a natural key for a GenericForeignKey in Django?

10👍

TL;DR – Currently there is no sane way of doing so, short of creating a custom Serializer / Deserializer pair.

The problem with models that have generic relations is that Django doesn’t see target as a field at all, only target_content_type and target_object_id, and it tries to serialize and deserialize them individually.

The classes responsible for serializing and deserializing Django models are in the modules django.core.serializers.base and django.core.serializers.python. All the others (xml, json and yaml) extend either of them (and python extends base). The field serialization is done like this (irrelevant lines ommited):

    for obj in queryset:
        for field in concrete_model._meta.local_fields:
                if field.rel is None:
                        self.handle_field(obj, field)
                else:
                        self.handle_fk_field(obj, field)

Here’s the first complication: the foreign key to ContentType is handled ok, with natural keys as we expected. But the PositiveIntegerField is handled by handle_field, that is implemented like this:

def handle_field(self, obj, field):
    value = field._get_val_from_obj(obj)
    # Protected types (i.e., primitives like None, numbers, dates,
    # and Decimals) are passed through as is. All other values are
    # converted to string first.
    if is_protected_type(value):
        self._current[field.name] = value
    else:
        self._current[field.name] = field.value_to_string(obj)

i.e. the only possibility for customization here (subclassing PositiveIntegerField and defining a custom value_to_string) will have no effect, since the serializer won’t call it. Changing the data type of target_object_id to something else than a integer will probably break many other stuff, so it’s not an option.

We could define our custom handle_field to emit natural keys in this case, but then comes the second complication: the deserialization is done like this:

   for (field_name, field_value) in six.iteritems(d["fields"]):
        field = Model._meta.get_field(field_name)
        ...
            data[field.name] = field.to_python(field_value)

Even if we customized the to_python method, it acts on the field_value alone, out of the context of the object. It’s not a problem when using integers, since it will be interpreted as the model’s primary key no matter what model it is. But to deserialize a natural key, first we need to know which model that key belongs to, and that information isn’t available unless we got a reference to the object (and the target_content_type field had already been deserialized).

As you can see, it’s not an impossible task – supporting natural keys in generic relations – but to accomplish that a lot of things would need to be changed in the serialization and deserialization code. The steps necessary, then (if anyone feels up to the task) are:

  • Create a custom Field extending PositiveIntegerField, with methods to encode/decode an object – calling the referenced models’ natural_key and get_by_natural_key;
  • Override the serializer’s handle_field to call the encoder if present;
  • Implement a custom deserializer that: 1) imposes some order in the fields, ensuring the content type is deserialized before the natural key; 2) calls the decoder, passing not only the field_value but also a reference to the decoded ContentType.

3👍

I’ve written a custom Serializer and Deserializer which supports GenericFK’s. Checked it briefly and it seems to do the job.

This is what I came up with:

import json

from django.contrib.contenttypes.generic import GenericForeignKey
from django.utils import six
from django.core.serializers.json import Serializer as JSONSerializer
from django.core.serializers.python import Deserializer as \
    PythonDeserializer, _get_model
from django.core.serializers.base import DeserializationError
import sys


class Serializer(JSONSerializer):

    def get_dump_object(self, obj):
        dumped_object = super(CustomJSONSerializer, self).get_dump_object(obj)
        if self.use_natural_keys and hasattr(obj, 'natural_key'):
            dumped_object['pk'] = obj.natural_key()
            # Check if there are any generic fk's in this obj
            # and add a natural key to it which will be deserialized by a matching Deserializer.
            for virtual_field in obj._meta.virtual_fields:
                if type(virtual_field) == GenericForeignKey:
                    content_object = getattr(obj, virtual_field.name)
                    dumped_object['fields'][virtual_field.name + '_natural_key'] = content_object.natural_key()
        return dumped_object


def Deserializer(stream_or_string, **options):
    """
    Deserialize a stream or string of JSON data.
    """
    if not isinstance(stream_or_string, (bytes, six.string_types)):
        stream_or_string = stream_or_string.read()
    if isinstance(stream_or_string, bytes):
        stream_or_string = stream_or_string.decode('utf-8')
    try:
        objects = json.loads(stream_or_string)
        for obj in objects:
            Model = _get_model(obj['model'])
            if isinstance(obj['pk'], (tuple, list)):
                o = Model.objects.get_by_natural_key(*obj['pk'])
                obj['pk'] = o.pk
                # If has generic fk's, find the generic object by natural key, and set it's
                # pk according to it.
                for virtual_field in Model._meta.virtual_fields:
                    if type(virtual_field) == GenericForeignKey:
                        natural_key_field_name = virtual_field.name + '_natural_key'
                        if natural_key_field_name in obj['fields']:
                            content_type = getattr(o, virtual_field.ct_field)
                            content_object_by_natural_key = content_type.model_class().\
                            objects.get_by_natural_key(obj['fields'][natural_key_field_name][0])
                            obj['fields'][virtual_field.fk_field] = content_object_by_natural_key.pk
        for obj in PythonDeserializer(objects, **options):
            yield obj
    except GeneratorExit:
        raise
    except Exception as e:
        # Map to deserializer error
        six.reraise(DeserializationError, DeserializationError(e), sys.exc_info()[2])

0👍

I updated the OmriToptix answer for Django 2.2 and above.

In Django 2.0:

The Model._meta.virtual_fields attribute is removed.

So, the new Serializer and Deserializer:

import json

from django.contrib.contenttypes.fields import GenericForeignKey
from django.utils import six
from django.core.serializers.json import Serializer as JSONSerializer
from django.core.serializers.python import Deserializer as \
    PythonDeserializer, _get_model
from django.core.serializers.base import DeserializationError
import sys


class Serializer(JSONSerializer):

    def get_dump_object(self, obj):
        dumped_object = super(JSONSerializer, self).get_dump_object(obj)

        if hasattr(obj, 'natural_key'):
            dumped_object['pk'] = obj.natural_key()
            for field in obj._meta.get_fields():
                if type(field) == GenericForeignKey:
                    content_object = getattr(obj, field.name)
                    dumped_object['fields'][field.name + '_natural_key'] = content_object.natural_key()
        return dumped_object


def Deserializer(stream_or_string, **options):
    if not isinstance(stream_or_string, (bytes, six.string_types)):
        stream_or_string = stream_or_string.read()
    if isinstance(stream_or_string, bytes):
        stream_or_string = stream_or_string.decode('utf-8')
    try:
        objects = json.loads(stream_or_string)
        for obj in objects:
            Model = _get_model(obj['model'])
            if isinstance(obj['pk'], (tuple, list)):
                o = Model.objects.get_by_natural_key(*obj['pk'])
                obj['pk'] = o.pk
                for field in Model._meta.get_fields():
                    if type(field) == GenericForeignKey:
                        natural_key_field_name = field.name + '_natural_key'
                        if natural_key_field_name in obj['fields']:
                            content_type = getattr(o, field.ct_field)
                            content_object_by_natural_key = content_type.model_class().\
                            objects.get_by_natural_key(*obj['fields'][natural_key_field_name])
                            obj['fields'][field.fk_field] = content_object_by_natural_key.pk
                            del obj['fields'][natural_key_field_name]

        for obj in PythonDeserializer(objects, **options):
            yield obj
    except GeneratorExit:
        raise
    except Exception as e:
        six.reraise(DeserializationError, DeserializationError(e), sys.exc_info()[2])

Then, in your settings.py, set this configuration:

    SERIALIZATION_MODULES = {
    "json": "path.to.serializer_file"
  }

Now, you can use:

python3 manage.py dumpdata --natural-foreign --natural-primary > dump.json

Other way, if you need to dump some data (filter querysets), you can make it from code:

from path.to.serializers import Serializer, Deserializer

# Serialize
registers = YourModel.objects.filter(some_attribute=some_value)
dump = Serializer().serialize(registers, use_natural_foreign_keys=True, use_natural_primary_keys=True)

# Deserialize
for deserialized_object in Deserializer(dump, use_natural_foreign_keys=True, use_natural_primary_keys=True):
    print(deserialized_object.object)  # See here https://docs.djangoproject.com/en/2.2/topics/serialization/

Leave a comment