[Solved]-Combine trigram with ranked searching in django 1.10

22👍

We investigated more thoroughly understood how search works weights.

According to documents you can be assigned weights according to the fields and they can even be assigned weights, and similarly we can use trigrams to filter by similarity or distance.

However not specify an example of using the two and investigating further it understood nor much as weights work.

A little logic tells us that if we seek a common word in all we will all ranks 0, similarity varies much more than ranges, however tends to lower values ​​that range.

Now, text search, as far as we understand, it is carried out based on the text contained in the fields you want to filter even more than in the language that is placed in the configuration. Example is that putting titles, the used model had a title field and a content field, whose most common words were how change, reviewing weighted words (ranges function as query, so we can use values ​​or values_list to review the ranks and similarities, which are numerical values, we can view weighted words viewing vector object), we saw that if weights were allocated, but combinations of splitted words: found ‘perfil’ and ‘cambi’, however we did not find ‘cambiar’ or ‘como’; however, all models had contained the same text as ‘lorem ipsun …’, and all the words of that sentence if they were whole and with weights B; We conclude with this that the searches are done based on the contents of the fields to filter more than the language with which we configure searches.

That said, here we present the code we use for everything.

First, we need to use Trigrams the extent necessary to enable the database:

from django.db import migrations
from django.contrib.postgres.operations import UnaccentExtension, TrigramExtension

class Migration(migrations.Migration):

    initial = True

    dependencies = [
    ]

    operations = [
      ...
      TrigramExtension(),
      UnaccentExtension(),

    ]

Import operations for migration from postgres packages and run from any file migration .

The next step is to change the code of the question so that the filter returns one of the querys if the second fails:

def get_queryset(self):
        search_query = SearchQuery(self.request.GET.get('q', ''))

        vector = SearchVector(
            'name',
            weight='A',
            config=settings.SEARCH_LANGS[settings.LANGUAGE_CODE],
        ) + SearchVector(
            'content',
            weight='B',
            config=settings.SEARCH_LANGS[settings.LANGUAGE_CODE],
        )

        if self.request.user.is_authenticated:
            queryset = Article.actives.all()
        else:
            queryset = Article.publics.all()

        return queryset.annotate(
          rank=SearchRank(vector, search_query)
          similarity=TrigramSimilarity(
              'name', search_query
            ) + TrigramSimilarity(
              'content', search_query
            ),
        ).filter(Q(rank__gte=0.3) | Q(similarity__gt=0.3)).order_by('-rank')[:20]

The problem with the above code was seeping one query after another, and if the word chosen not appear in any of the two searches the problem is greater . We use a Q object to filter using an OR connector so that if one of the two does not return a desired value , send the other in place.

With this is enough, however they are welcome clarifications depth on how these weights and trigramas work, to explitar the most of this new advantage offered by the latest version of Django.

0👍

some thing like this will work for you… which it is a search form for a blog. But i dont know why TrigramSimilarity just works on titles and it doesnt do same work on body?

search_vector = SearchVector('title', weight='A') + SearchVector('body', weight='B')
search_query = SearchQuery(query)
rank = SearchRank(search_vector, search_query)  
results = Post.published.annotate(rank = SearchRank(search_vector, 
          search_query)).filter(rank__gte=0.2).order_by('-rank')

      if results:
        results = results
      else:
        results = Post.published.annotate(similarity = TrigramSimilarity('title', 
                  query)).filter(similarity__gte=0.1).order_by('-similarity')

Leave a comment