15đź‘Ť
That’s not Django’s bug actually, that’s how databases work internally and for example looks like MySql for example doesn’t have natural sort by default (I googled not a lot, so maybe I am wrong there). But we can use some workaround for the case.
I put everything with examples & screenshots at https://gist.github.com/phpdude/8a45e1bd2943fa806aeffee94877680a
But basically for the given models.py
file
from django.db import models
class Item(models.Model):
signature = models.CharField('Signatur', max_length=50)
def __str__(self):
return self.signature
I’ve used admin.py
just for example with the correct filter implementation
from django.contrib.admin import ModelAdmin, register, SimpleListFilter
from django.db.models.functions import Length, StrIndex, Substr, NullIf, Coalesce
from django.db.models import Value as V
from .models import Item
class AlphanumericSignatureFilter(SimpleListFilter):
title = 'Signature (alphanumeric)'
parameter_name = 'signature_alphanumeric'
def lookups(self, request, model_admin):
return (
('signature', 'Signature (alphanumeric)'),
)
def queryset(self, request, queryset):
if self.value() == 'signature':
return queryset.order_by(
Coalesce(Substr('signature', V(0), NullIf(StrIndex('signature', V(' ')), V(0))), 'signature'),
Length('signature'),
'signature'
)
@register(Item)
class Item(ModelAdmin):
list_filter = [AlphanumericSignatureFilter]
Screenshots with examples
A few references:
- http://www.mysqltutorial.org/mysql-natural-sorting/
- https://docs.djangoproject.com/en/2.0/ref/contrib/admin/
PS: It looks like db function Length(column_name)
was added on Django 1.9, so you should be able to use it, but generally any Django version supports custom db ORM function call and you can call length()
function of the field.
Extra example with using Python library natsort
It will work, but requires to load all the possible signatures before for correct sort since it sorts the rows list using python side, not DB side.
It works. But it could be pretty slow in case of a big table size.
From my point of view it should be used only on db tables sizes less than 50 000 rows (for example, depends on your DB server performance & etc).
from django.contrib.admin import ModelAdmin, register, SimpleListFilter
from django.db.models.functions import StrIndex, Concat
from django.db.models import Value as V
from natsort import natsorted
from .models import Item
class AlphanumericTruePythonSignatureFilter(SimpleListFilter):
title = 'Signature (alphanumeric true python)'
parameter_name = 'signature_alphanumeric_python'
def lookups(self, request, model_admin):
return (
('signature', 'Signature (alphanumeric)'),
)
def queryset(self, request, queryset):
if self.value() == 'signature':
all_ids = list(queryset.values_list('signature', flat=True))
# let's use "!:!" as a separator for signature values
all_ids_sorted = "!:!" + "!:!".join(natsorted(all_ids))
return queryset.order_by(
StrIndex(V(all_ids_sorted), Concat(V('!:!'), 'signature')),
)
@register(Item)
class Item(ModelAdmin):
list_filter = [AlphanumericTruePythonSignatureFilter]
5đź‘Ť
If you don’t mind to target a specific database, you can use RawSQL() to inject a SQL expression for parsing your “signature” field, then annotate the recordset with the result; for example (PostgreSQL):
queryset = (
Item.objects.annotate(
right_part=RawSQL("cast(split_part(signature, ' ', 2) as int)", ())
).order_by('right_part')
)
(In case you needed to support different database formats, you could additionally detect the active engine and supply a suitable expression accordingly)
The nice thing about RawSQL() is that you make very explicit when and where you’re applying a database-specific feature.
As noted by @schillingt, Func() may also be an options.
On the other side, I would avoid extra() as it might be very well deprecated (see: https://docs.djangoproject.com/en/2.2/ref/models/querysets/#extra).
Proof (for PostgreSQL):
class Item(models.Model):
signature = models.CharField('Signatur', max_length=50)
def __str__(self):
return self.signature
-----------------------------------------------------
import django
from django.db.models.expressions import RawSQL
from pprint import pprint
from backend.models import Item
class ModelsItemCase(django.test.TransactionTestCase):
def test_item_sorting(self):
signatures = [
'BA 1',
'BA 10',
'BA 100',
'BA 2',
'BA 1002',
'BA 1000',
'BA 1001',
]
for signature in signatures:
Item.objects.create(signature=signature)
pprint(list(Item.objects.all()))
print('')
queryset = (
Item.objects.annotate(
right_part=RawSQL("cast(split_part(signature, ' ', 2) as int)", ())
).order_by('right_part')
)
pprint(list(queryset))
self.assertEqual(queryset[0].signature, 'BA 1')
self.assertEqual(queryset[1].signature, 'BA 2')
self.assertEqual(queryset[2].signature, 'BA 10')
self.assertEqual(queryset[3].signature, 'BA 100')
self.assertEqual(queryset[4].signature, 'BA 1000')
self.assertEqual(queryset[5].signature, 'BA 1001')
self.assertEqual(queryset[6].signature, 'BA 1002')
Result:
test_item_sorting (backend.tests.test_item.ModelsItemCase) ... [<Item: BA 1>,
<Item: BA 10>,
<Item: BA 100>,
<Item: BA 2>,
<Item: BA 1002>,
<Item: BA 1000>,
<Item: BA 1001>]
[<Item: BA 1>,
<Item: BA 2>,
<Item: BA 10>,
<Item: BA 100>,
<Item: BA 1000>,
<Item: BA 1001>,
<Item: BA 1002>]
ok
----------------------------------------------------------------------
Ran 1 test in 0.177s
- Django.core.exceptions.ImproperlyConfigured
- What is the difference between the create and perform_create methods in Django rest-auth
- Celery – No module named five
- Testing a session variable
3đź‘Ť
A simple approach is to add another field that is used only for sorting:
class Item(models.Model):
signature = models.CharField('Signatur', max_length=50)
sort_string = models.CharField(max_length=60, blank=True, editable=False)
class Meta:
ordering = ['sort_string']
def save(self, *args, **kwargs):
parts = self.signature.split()
parts[2] = "{:06d}".format(int(parts[2]))
self.sort_string = "".join(parts)
super().save(*args, **kwargs)
Depending on how often your data is updated vs read, this is probably very efficient. sort_string
is calculated once whenever an Item is updated, but then it is available as a simple field whenever it is needed. It is simple to adjust the way sort_string
is calculated to meet your exact requirements.
It may also be useful (particularly during development) to add a re-save action to your admin:
def re_save(modeladmin, request, queryset):
for item in queryset:
item.save()
re_save.short_description = "Re-save"
class ItemAdmin(admin.ModelAdmin):
actions = [re_save, ]
....
so it is easy to trigger re-calculation.
2đź‘Ť
I’m assuming your signature field follows this pattern: AAA 123
letters followed by a space followed by numbers (int).
Item.objects.extra(select={
's1': 'cast(split_part(signature, \' \', 2) as int)',
's2': 'split_part(signature, \' \', 1)'
}).order_by('s2', 's1')
1đź‘Ť
How do you get your naming BA 1, BA 1000 …etc, the easiest solution is to store your data as this, BA 0001, BA 0002, then use order by, that will work.
else you have to apply a mapper, with python in order to transform your list and reorder it with a python logic.
- Making Twitter, Tastypie, Django, XAuth and iOS work to Build Django-based Access Permissions
- Run django app via nginx+uwsgi in a subpath
- Converting a django ValuesQuerySet to a json object
- Django: Generic views based 'as_view()' method
1đź‘Ť
I figured this would be simple solution, but apparently it’s not. Kudos to you on the good question. This is the approach I suggest:
- Read up on how others have solved this at the Postgres / DB level and determine the best way to handle it for yourself. Do you need a custom type, can you use a simple regex, etc
- Depending on the above, implement that solution for Postgres in a Django migration. You may need to create a type which can be done via a custom SQL migration. Or maybe you need to create a function at the database level.
- Utilize the new postgres artifact. This part will definitely be complicated. You may need use
.extra
or aFunc
to access the function or type.
This should be possible, but it’ll definitely involve some DB changes and non-typical django usage.
- Why does Django South require a default value when removing a field?
- Django: How to check if something is an email without a form
- Django: How to keep the test database when the test is finished?
- Does changing a django models related_name attribute require a south migration?
1đź‘Ť
Elaborating further my previous proposal and the interesting solution given by @Alexandr Shurigin, I’m now suggesting another option.
This new solution splits “signature” into two field:
- code: a variable length alfanumeric string
- weigth: a numeric value, possibly with leading 0s to be ignored
Given:
[
'X 1',
'XY 1',
'XYZ 1',
'BA 1',
'BA 10',
'BA 100',
'BA 2',
'BA 1002',
'BA 1000',
'BA 1001',
'BA 003',
]
the expected result is:
[
'BA 1',
'BA 2',
'BA 003',
'BA 10',
'BA 100',
'BA 1000',
'BA 1001',
'BA 1002',
'X 1',
'XY 1',
'XYZ 1',
]
All computations are delegated to the database in a generic way, thanks to django.db.models.functions module.
queryset = (
Item.objects.annotate(
split_index=StrIndex('signature', Value(' ')),
).annotate(
left=Substr('signature', Value(1), 'split_index', output_field=CharField()),
right=Substr('signature', F('split_index'), output_field=CharField()),
).annotate(
code=Trim('left'),
weight=Cast('right', output_field=IntegerField())
).order_by('code', 'weight')
)
A more compact, but also less readable solution, is this:
queryset = (
Item.objects.annotate(
split_index=StrIndex('signature', Value(' ')),
).annotate(
code=Trim(Substr('signature', Value(1), 'split_index', output_field=CharField())),
weight=Cast(Substr('signature', F('split_index'), output_field=CharField()), output_field=IntegerField())
).order_by('code', 'weight')
)
What I’m really missing here is a “IndexOf” function to compute “split_index” as the position of either the first space OR digit, thus giving a really Natural Sort behaviour (to accept, for example “BA123” as well as “BA 123”)
Proof:
import django
#from django.db.models.expressions import RawSQL
from pprint import pprint
from backend.models import Item
from django.db.models.functions import Length, StrIndex, Substr, Cast, Trim
from django.db.models import Value, F, CharField, IntegerField
class ModelsItemCase(django.test.TransactionTestCase):
def test_item_sorting(self):
signatures = [
'X 1',
'XY 1',
'XYZ 1',
'BA 1',
'BA 10',
'BA 100',
'BA 2',
'BA 1002',
'BA 1000',
'BA 1001',
'BA 003',
]
for signature in signatures:
Item.objects.create(signature=signature)
print(' ')
pprint(list(Item.objects.all()))
print('')
expected_result = [
'BA 1',
'BA 2',
'BA 003',
'BA 10',
'BA 100',
'BA 1000',
'BA 1001',
'BA 1002',
'X 1',
'XY 1',
'XYZ 1',
]
queryset = (
Item.objects.annotate(
split_index=StrIndex('signature', Value(' ')),
).annotate(
code=Trim(Substr('signature', Value(1), 'split_index', output_field=CharField())),
weight=Cast(Substr('signature', F('split_index'), output_field=CharField()), output_field=IntegerField())
).order_by('code', 'weight')
)
pprint(list(queryset))
print(' ')
print(str(queryset.query))
self.assertSequenceEqual(
[row.signature for row in queryset],
expected_result
)
The resulting query, for sqlite3 is:
SELECT
"backend_item"."id",
"backend_item"."signature",
INSTR("backend_item"."signature", ) AS "split_index",
TRIM(SUBSTR("backend_item"."signature", 1, INSTR("backend_item"."signature", ))) AS "code",
CAST(SUBSTR("backend_item"."signature", INSTR("backend_item"."signature", )) AS integer) AS "weight"
FROM "backend_item"
ORDER BY "code" ASC, "weight" ASC
and for PostgreSQL:
SELECT
"backend_item"."id",
"backend_item"."signature",
STRPOS("backend_item"."signature", ) AS "split_index",
TRIM(SUBSTRING("backend_item"."signature", 1, STRPOS("backend_item"."signature", ))) AS "code",
(SUBSTRING("backend_item"."signature", STRPOS("backend_item"."signature", )))::integer AS "weight"
FROM "backend_item"
ORDER BY "code" ASC, "weight" ASC
- Django Can't Find My Templates
- Is there a way to set the id value of new Django objects to start at a certain value?
0đź‘Ť
Assuming the format for the signature field is fixed (with a single space and second part is numeric: [^ ]+ \d+
),
we can split it into two parts – base_name (string) and sig_value (int).
Also you don’t need the SimpleListFilter
(it has a different purpose – to create filters!). You can simply override the get_queryset
method:
from django.contrib import admin
from django.db.models import F, IntegerField, TextField, Value
from django.db.models.functions import Cast, StrIndex, Substr
from .models import Item
@admin.register(Item)
class ItemAdmin(admin.ModelAdmin):
def get_queryset(self, request):
qs = super(ItemAdmin, self).get_queryset(request)
return qs.annotate(
# 1-indexed position of space
space=StrIndex("name", Value(" ")),
# part of text before the space
base_name=Substr("name", 1, F("space") - 1, output_field=TextField()),
# cast part of text after the space as int
sig_value=Cast(Substr("name", F("space")), IntegerField()),
).order_by("base_name", "sig_value")
- Django models across multiple projects/microservices. How to?
- Is there a Django template tag that lets me set a context variable?