1👍
Sorry if this is obvious and wrong (that no-one has suggested it in 4 hours is worrying!), but why not search for all matches, do a batch query for everything (easy once you have all matches), and then call sub with the dictionary of results (so the function pulls the data from the dict)?
You have to run the regexp twice, but it seems like the database access is the expensive part anyway.
1👍
You can do it with a single regexp pass, by using finditer
which returns match objects.
The match object have:
- a method returning a dict of the named groups,
groupdict()
- the start and the end positions of the match in the original text,
span()
- the original matching text,
group()
So I would suggest that you:
- Make a list of all the matches in your text using
finditer
- Make a list of all the unique volume, reporter, page triplets in the matches
- Lookup those triplets
- Correlate each match object with the result of the triplet lookup if found
- Process the original text, splitting by the match spans and interpolating lookup results.
I’ve implemented the database lookup by combining a list of Q(volume=foo1,reporter=bar2,page=baz3)|Q(volume=foo1,reporter=bar2,page=baz3)...
. There maybe be more efficient approaches.
Here’s an untested implementation:
from django.db.models import Q
from collections import namedtuple
Triplet = namedtuple('Triplet',['volume','reporter','page'])
def lookup_references(matches):
match_to_triplet = {}
triplet_to_url = {}
for m in matches:
group_dict = m.groupdict()
if any(not(x) for x in group_dict.values()): # Filter out matches we don't want to lookup
continue
match_to_triplet[m] = Triplet(**group_dict)
# Build query
unique_triplets = set(match_to_triplet.values())
# List of Q objects
q_list = [Q(**trip._asdict()) for trip in unique_triplets]
# Consolidated Q
single_q = reduce(Q.__or__,q_list)
for row in Citations.objects.filter(single_q).values('volume','reporter','page','url'):
url = row.pop('url')
triplet_to_url[Triplet(**row)] = url
# Now pair original match objects with URL where found
lookups = {}
for match, triplet in match_to_triplet.items():
if triplet in triplet_to_url:
lookups[match] = triplet_to_url[triplet]
return lookups
def interpolate_citation_matches(text,matches,lookups):
result = []
prev = m_start = 0
last = m_end = len(text)
for m in matches:
m_start, m_end = m.span()
if prev != m_start:
result.append(text[prev:m_start])
# Now check match
if m in lookups:
result.append('<a href="%s">%s</a>' % (lookups[m],m.group()))
else:
result.append(m.group())
if m_end != last:
result.append(text[m_end:last])
return ''.join(result)
def process_citations(text):
citation_regex = r'(?P<volume>[0-9]+[a-zA-Z]{0,3})\s+(?P<reporter>[A-Z][a-zA-Z0-9\.\s]{1,49}?)\s+(?P<page>[0-9]+[a-zA-Z]{0,3}))'
matches = list(re.finditer(citation_regex,text))
lookups = lookup_references(matches)
new_text = interpolate_citation_matches(text,matches,lookups)
return new_text
- [Answered ]-CherryPy with Cheetah as plugin + tool – blank pages
- [Answered ]-Ajax Django Empty List
- [Answered ]-How do I Delete related resources with TastyPie
- [Answered ]-How to run django-project?
Source:stackexchange.com