[Solved]-Is there a way to filter a django queryset based on string similarity (a la python difflib)?

2👍

soundex won’t help you, because it’s a phonetic algorithm. Joe and Joseph aren’t similar phonetically, so soundex won’t mark them as similar.

You can try Levenshtein distance, which is implemented in PostgreSQL. Maybe in your database too and if not, you should be able to write a stored procedure, which will calculate the distance between two strings and use it in your computation.

2👍

It’s possible with trigram_similar lookups since Django 1.10, see docs for PostgreSQL specific lookups and Full text search

2👍

As per the answer of andilabs you can use the Levenshtein function to create your custom function. Postgres doc indicates that the Levenshtein function is as follows:

levenshtein(text source, text target, int ins_cost, int del_cost, int sub_cost) returns int
levenshtein(text source, text target) returns int

andilabs answer can use the only second function. If you want a more advanced search with insertion/deletion/substitution costs, you can rewrite function like this:

from django.db.models import Func

class Levenshtein(Func):
    template = "%(function)s(%(expressions)s, '%(search_term)s', %(ins_cost)d, %(del_cost)d, %(sub_cost)d)"
    function = 'levenshtein'

    def __init__(self, expression, search_term, ins_cost=1, del_cost=1, sub_cost=1, **extras):
        super(Levenshtein, self).__init__(
            expression,
            search_term=search_term,
            ins_cost=ins_cost,
            del_cost=del_cost,
            sub_cost=sub_cost,
            **extras
        )

And call the function:

from django.db.models import F

Spot.objects.annotate(
    lev_dist=Levenshtein(F('name'), 'Kfaka', 3, 3, 1)  # ins = 3, del = 3, sub = 1
).filter(
    lev_dist__lte=2
)

1👍

If you need getting there with django and postgres and don’t want to use introduced in 1.10 trigram-similarity https://docs.djangoproject.com/en/2.0/ref/contrib/postgres/lookups/#trigram-similarity you can implement using Levensthein like these:

Extension needed fuzzystrmatch

you need adding postgres extension to your db in psql:

CREATE EXTENSION fuzzystrmatch;

Lets define custom function with wich we can annotate queryset. It just take one argument the search_term and uses postgres levenshtein function (see docs):

from django.db.models import Func

class Levenshtein(Func):
    template = "%(function)s(%(expressions)s, '%(search_term)s')"
    function = "levenshtein"

    def __init__(self, expression, search_term, **extras):
        super(Levenshtein, self).__init__(
            expression,
            search_term=search_term,
            **extras
        )

then in any other place in project we just import defined Levenshtein and F to pass the django field.

from django.db.models import F

Spot.objects.annotate(
    lev_dist=Levenshtein(F('name'), 'Kfaka')
).filter(
    lev_dist__lte=2
)

Leave a comment