From: | raylu <lurayl(at)gmail(dot)com> |
---|---|
To: | pgsql-general(at)lists(dot)postgresql(dot)org |
Subject: | Text search lexer's handling of hyphens and negatives |
Date: | 2019-10-15 21:51:37 |
Message-ID: | CAPD=2WFRaQk9LhocKPB-ewX9spRnWJwy=wFm8sm5L0yPqny1vA@mail.gmail.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-general |
(I sent a similar message before subscribing to the list but it hasn't
gone through yet, so sorry if you see a duplicate of this...)
We've been happily using pgsql to store user-generated documents for a
while now. We also wanted to be able to search the documents so we
tossed the document contents into a tsvector and did a pretty
straightforward contents @@ phraseto_tsquery('simple', 'the query').
Our users have a lot of things named like ABC-DEF-GHI so that sort of
hyphenated name appears in their documents fairly often.
to_tsvector('simple', 'ABC-DEF-GHI') @@ phraseto_tsquery('simple',
'ABC-DEF-GHI') works without issue.
Sometimes, these hyphenated names have numbers in them like
UVW-789-XYZ. Still no problem with to_tsvector/phraseto_tsquery.
Sometimes, users can only remember the last bit of the name. So they'd
like to find the document with ABC-DEF-GHI in it by searching for
'DEF-GHI'. Since to_tsvector('simple', 'ABC-DEF-GHI') is
'abc-def-ghi':1 'abc':2 'def':3 'ghi':4
we search for to_tsquery('simple', 'def <-> ghi') instead of using
phraseto_tsquery. This works, but you can probably see where this is
going.
to_tsvector('simple', 'UVW-789-XYZ') is
'uvw':1 '-789':2 'xyz':3
because -789 is a negative integer. If we turn the query '789-XYZ'
into the tsquery as before, we get to_tsquery('simple', '789 <-> xyz')
which doesn't match it.
Are we missing something here? Is there either a way to
1. generate tsvectors without this special (negative) integer behavior or
2. generate tsqueries in a more intelligent way?
From | Date | Subject | |
---|---|---|---|
Next Message | Alan Hodgson | 2019-10-15 22:35:03 | Re: Text search lexer's handling of hyphens and negatives |
Previous Message | Ron | 2019-10-15 17:18:13 | Re: Securing records using linux grou permissions |