[Fixed]-Django: Natural Sort QuerySet

15đź‘Ť

âś…

That’s not Django’s bug actually, that’s how databases work internally and for example looks like MySql for example doesn’t have natural sort by default (I googled not a lot, so maybe I am wrong there). But we can use some workaround for the case.

I put everything with examples & screenshots at https://gist.github.com/phpdude/8a45e1bd2943fa806aeffee94877680a

But basically for the given models.py file

from django.db import models


class Item(models.Model):
    signature = models.CharField('Signatur', max_length=50)

    def __str__(self):
        return self.signature

I’ve used admin.py just for example with the correct filter implementation

from django.contrib.admin import ModelAdmin, register, SimpleListFilter
from django.db.models.functions import Length, StrIndex, Substr, NullIf, Coalesce
from django.db.models import Value as V

from .models import Item


class AlphanumericSignatureFilter(SimpleListFilter):
    title = 'Signature (alphanumeric)'
    parameter_name = 'signature_alphanumeric'

    def lookups(self, request, model_admin):
        return (
            ('signature', 'Signature (alphanumeric)'),
        )

    def queryset(self, request, queryset):
        if self.value() == 'signature':
            return queryset.order_by(
                Coalesce(Substr('signature', V(0), NullIf(StrIndex('signature', V(' ')), V(0))), 'signature'),
                Length('signature'),
                'signature'
            )


@register(Item)
class Item(ModelAdmin):
    list_filter = [AlphanumericSignatureFilter]

Screenshots with examples

Raw user input data
Data sorted by natural key

A few references:

PS: It looks like db function Length(column_name) was added on Django 1.9, so you should be able to use it, but generally any Django version supports custom db ORM function call and you can call length() function of the field.


Extra example with using Python library natsort

It will work, but requires to load all the possible signatures before for correct sort since it sorts the rows list using python side, not DB side.

It works. But it could be pretty slow in case of a big table size.

From my point of view it should be used only on db tables sizes less than 50 000 rows (for example, depends on your DB server performance & etc).

from django.contrib.admin import ModelAdmin, register, SimpleListFilter
from django.db.models.functions import StrIndex, Concat
from django.db.models import Value as V
from natsort import natsorted

from .models import Item


class AlphanumericTruePythonSignatureFilter(SimpleListFilter):
    title = 'Signature (alphanumeric true python)'
    parameter_name = 'signature_alphanumeric_python'

    def lookups(self, request, model_admin):
        return (
            ('signature', 'Signature (alphanumeric)'),
        )

    def queryset(self, request, queryset):
        if self.value() == 'signature':
            all_ids = list(queryset.values_list('signature', flat=True))
            # let's use "!:!" as a separator for signature values
            all_ids_sorted = "!:!" + "!:!".join(natsorted(all_ids))

            return queryset.order_by(
                StrIndex(V(all_ids_sorted), Concat(V('!:!'), 'signature')),
            )


@register(Item)
class Item(ModelAdmin):
    list_filter = [AlphanumericTruePythonSignatureFilter]

And one more screenshot example for the case
Python side sorted signatures list using natsorted

5đź‘Ť

If you don’t mind to target a specific database, you can use RawSQL() to inject a SQL expression for parsing your “signature” field, then annotate the recordset with the result; for example (PostgreSQL):

queryset = (
    Item.objects.annotate(
        right_part=RawSQL("cast(split_part(signature, ' ', 2) as int)", ())
    ).order_by('right_part')
)

(In case you needed to support different database formats, you could additionally detect the active engine and supply a suitable expression accordingly)

The nice thing about RawSQL() is that you make very explicit when and where you’re applying a database-specific feature.

As noted by @schillingt, Func() may also be an options.
On the other side, I would avoid extra() as it might be very well deprecated (see: https://docs.djangoproject.com/en/2.2/ref/models/querysets/#extra).

Proof (for PostgreSQL):

class Item(models.Model):
    signature = models.CharField('Signatur', max_length=50)

    def __str__(self):
        return self.signature

-----------------------------------------------------

import django
from django.db.models.expressions import RawSQL
from pprint import pprint
from backend.models import Item


class ModelsItemCase(django.test.TransactionTestCase):

    def test_item_sorting(self):
        signatures = [
            'BA 1',
            'BA 10',
            'BA 100',
            'BA 2',
            'BA 1002',
            'BA 1000',
            'BA 1001',
        ]
        for signature in signatures:
            Item.objects.create(signature=signature)
        pprint(list(Item.objects.all()))
        print('')

        queryset = (
            Item.objects.annotate(
                right_part=RawSQL("cast(split_part(signature, ' ', 2) as int)", ())
            ).order_by('right_part')
        )

        pprint(list(queryset))

        self.assertEqual(queryset[0].signature, 'BA 1')
        self.assertEqual(queryset[1].signature, 'BA 2')
        self.assertEqual(queryset[2].signature, 'BA 10')
        self.assertEqual(queryset[3].signature, 'BA 100')
        self.assertEqual(queryset[4].signature, 'BA 1000')
        self.assertEqual(queryset[5].signature, 'BA 1001')
        self.assertEqual(queryset[6].signature, 'BA 1002')

Result:

test_item_sorting (backend.tests.test_item.ModelsItemCase) ... [<Item: BA 1>,
 <Item: BA 10>,
 <Item: BA 100>,
 <Item: BA 2>,
 <Item: BA 1002>,
 <Item: BA 1000>,
 <Item: BA 1001>]

[<Item: BA 1>,
 <Item: BA 2>,
 <Item: BA 10>,
 <Item: BA 100>,
 <Item: BA 1000>,
 <Item: BA 1001>,
 <Item: BA 1002>]
ok

----------------------------------------------------------------------
Ran 1 test in 0.177s
👤Mario Orlandi

3đź‘Ť

A simple approach is to add another field that is used only for sorting:

class Item(models.Model):
    signature = models.CharField('Signatur', max_length=50)
    sort_string = models.CharField(max_length=60, blank=True, editable=False)

    class Meta:
        ordering = ['sort_string']

    def save(self, *args, **kwargs):
        parts = self.signature.split()
        parts[2] = "{:06d}".format(int(parts[2]))
        self.sort_string = "".join(parts)
        super().save(*args, **kwargs)

Depending on how often your data is updated vs read, this is probably very efficient. sort_string is calculated once whenever an Item is updated, but then it is available as a simple field whenever it is needed. It is simple to adjust the way sort_string is calculated to meet your exact requirements.

It may also be useful (particularly during development) to add a re-save action to your admin:

def re_save(modeladmin, request, queryset):
    for item in queryset:
        item.save()
re_save.short_description = "Re-save"

class ItemAdmin(admin.ModelAdmin):
    actions = [re_save, ]
    ....

so it is easy to trigger re-calculation.

👤Mark Bailey

2đź‘Ť

I’m assuming your signature field follows this pattern: AAA 123 letters followed by a space followed by numbers (int).

Item.objects.extra(select={
    's1': 'cast(split_part(signature, \' \', 2) as int)', 
    's2': 'split_part(signature, \' \', 1)'
}).order_by('s2', 's1')
👤Airith

1đź‘Ť

How do you get your naming BA 1, BA 1000 …etc, the easiest solution is to store your data as this, BA 0001, BA 0002, then use order by, that will work.
else you have to apply a mapper, with python in order to transform your list and reorder it with a python logic.

👤gxmad

1đź‘Ť

I figured this would be simple solution, but apparently it’s not. Kudos to you on the good question. This is the approach I suggest:

  • Read up on how others have solved this at the Postgres / DB level and determine the best way to handle it for yourself. Do you need a custom type, can you use a simple regex, etc
  • Depending on the above, implement that solution for Postgres in a Django migration. You may need to create a type which can be done via a custom SQL migration. Or maybe you need to create a function at the database level.
  • Utilize the new postgres artifact. This part will definitely be complicated. You may need use .extra or a Functo access the function or type.

This should be possible, but it’ll definitely involve some DB changes and non-typical django usage.

👤schillingt

1đź‘Ť

Elaborating further my previous proposal and the interesting solution given by @Alexandr Shurigin, I’m now suggesting another option.

This new solution splits “signature” into two field:

  • code: a variable length alfanumeric string
  • weigth: a numeric value, possibly with leading 0s to be ignored

Given:

    [
        'X 1',
        'XY 1',
        'XYZ 1',
        'BA 1',
        'BA 10',
        'BA 100',
        'BA 2',
        'BA 1002',
        'BA 1000',
        'BA 1001',
        'BA 003',
    ]

the expected result is:

    [
        'BA 1',
        'BA 2',
        'BA 003',
        'BA 10',
        'BA 100',
        'BA 1000',
        'BA 1001',
        'BA 1002',
        'X 1',
        'XY 1',
        'XYZ 1',
    ]

All computations are delegated to the database in a generic way, thanks to django.db.models.functions module.

    queryset = (
        Item.objects.annotate(
            split_index=StrIndex('signature', Value(' ')),
        ).annotate(
            left=Substr('signature', Value(1), 'split_index', output_field=CharField()),
            right=Substr('signature', F('split_index'), output_field=CharField()),
        ).annotate(
            code=Trim('left'),
            weight=Cast('right', output_field=IntegerField())
        ).order_by('code', 'weight')
    )

A more compact, but also less readable solution, is this:

    queryset = (
        Item.objects.annotate(
            split_index=StrIndex('signature', Value(' ')),
        ).annotate(
            code=Trim(Substr('signature', Value(1), 'split_index', output_field=CharField())),
            weight=Cast(Substr('signature', F('split_index'), output_field=CharField()), output_field=IntegerField())
        ).order_by('code', 'weight')
    )

What I’m really missing here is a “IndexOf” function to compute “split_index” as the position of either the first space OR digit, thus giving a really Natural Sort behaviour (to accept, for example “BA123” as well as “BA 123”)

Proof:

import django
#from django.db.models.expressions import RawSQL
from pprint import pprint
from backend.models import Item
from django.db.models.functions import Length, StrIndex, Substr, Cast, Trim
from django.db.models import Value, F, CharField, IntegerField


class ModelsItemCase(django.test.TransactionTestCase):

    def test_item_sorting(self):

        signatures = [
            'X 1',
            'XY 1',
            'XYZ 1',
            'BA 1',
            'BA 10',
            'BA 100',
            'BA 2',
            'BA 1002',
            'BA 1000',
            'BA 1001',
            'BA 003',
        ]
        for signature in signatures:
            Item.objects.create(signature=signature)
        print(' ')
        pprint(list(Item.objects.all()))
        print('')

        expected_result = [
            'BA 1',
            'BA 2',
            'BA 003',
            'BA 10',
            'BA 100',
            'BA 1000',
            'BA 1001',
            'BA 1002',
            'X 1',
            'XY 1',
            'XYZ 1',
        ]

        queryset = (
            Item.objects.annotate(
                split_index=StrIndex('signature', Value(' ')),
            ).annotate(
                code=Trim(Substr('signature', Value(1), 'split_index', output_field=CharField())),
                weight=Cast(Substr('signature', F('split_index'), output_field=CharField()), output_field=IntegerField())
            ).order_by('code', 'weight')
        )
        pprint(list(queryset))

        print(' ')
        print(str(queryset.query))
        self.assertSequenceEqual(
            [row.signature for row in queryset],
            expected_result
        )

The resulting query, for sqlite3 is:

SELECT 
    "backend_item"."id", 
    "backend_item"."signature", 
    INSTR("backend_item"."signature",  ) AS "split_index", 
    TRIM(SUBSTR("backend_item"."signature", 1, INSTR("backend_item"."signature",  ))) AS "code", 
    CAST(SUBSTR("backend_item"."signature", INSTR("backend_item"."signature",  )) AS integer) AS "weight" 
FROM "backend_item" 
ORDER BY "code" ASC, "weight" ASC

and for PostgreSQL:

SELECT 
    "backend_item"."id", 
    "backend_item"."signature", 
    STRPOS("backend_item"."signature",  ) AS "split_index", 
    TRIM(SUBSTRING("backend_item"."signature", 1, STRPOS("backend_item"."signature",  ))) AS "code", 
    (SUBSTRING("backend_item"."signature", STRPOS("backend_item"."signature",  )))::integer AS "weight" 
FROM "backend_item" 
ORDER BY "code" ASC, "weight" ASC
👤Mario Orlandi

0đź‘Ť

Assuming the format for the signature field is fixed (with a single space and second part is numeric: [^ ]+ \d+),
we can split it into two parts – base_name (string) and sig_value (int).

Also you don’t need the SimpleListFilter (it has a different purpose – to create filters!). You can simply override the get_queryset method:

from django.contrib import admin
from django.db.models import F, IntegerField, TextField, Value
from django.db.models.functions import Cast, StrIndex, Substr

from .models import Item

@admin.register(Item)
class ItemAdmin(admin.ModelAdmin):
    def get_queryset(self, request):
        qs = super(ItemAdmin, self).get_queryset(request)
        return qs.annotate(
            # 1-indexed position of space
            space=StrIndex("name", Value(" ")),

            # part of text before the space
            base_name=Substr("name", 1, F("space") - 1, output_field=TextField()),

            # cast part of text after the space as int
            sig_value=Cast(Substr("name", F("space")), IntegerField()),
        ).order_by("base_name", "sig_value")
👤jatinderjit

Leave a comment