From: | Sumit Raja <sumit(dot)raja(at)raja-consulting(dot)co(dot)uk> |
---|---|
To: | pgsql <pgsql-general(at)postgresql(dot)org> |
Subject: | FTS for a controlled vocab |
Date: | 2012-10-10 16:25:51 |
Message-ID: | CAB4mO2d=s0KiFy_7fkExexarkMES8Y=0nBFa_SC+YQCYWQmDDQ@mail.gmail.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-general |
Hello,
I am trying to identify how best to handle the situation where a controlled
vocabulary needs to be searched on using full text search.
I have a list of brand names that have, what FTS deems, blank characters in
them that I need to search against. E.g. (+)people, D&G, 100% Design.
These particular combinations are proving to be difficult so I would like
to do a replacement at index and query time
(+)people, +people -> pluspeople
100% Design -> 100percent Design
D&G, D & G, DG -> DandG
Running these through the default parser means I get a much reduced lexemes
that won't be exact enough.
alias | description | token | dictionaries | dictionary |
lexemes
-----------+-----------------+-------+----------------+--------------+---------
asciiword | Word, all ASCII | d | {english_stem} | english_stem | {d}
blank | Space symbols | & | {} | |
asciiword | Word, all ASCII | g | {english_stem} | english_stem | {g}
(3 rows)
alias | description | token | dictionaries | dictionary |
lexemes
-----------+-----------------+--------+----------------+--------------+---------
blank | Space symbols | ( | {} | |
blank | Space symbols | +) | {} | |
asciiword | Word, all ASCII | people | {english_stem} | english_stem |
{peopl}
Can I achieve this with FTS and dictionaries or would I need a custom
parser? Any other ideas on how a search like this could work?
I have considered using the actual text column in the query to try and
match the exact term using ilike and pg_trgm. So for a user query of 'D&G
dresses' the select could be:
select * from test where text_val @@ plainto_tsquery('english','d&g
dresses') and lex ilike'%d&g%';
but there would be some horrible query mangling to find all words that have
the blank tokens and use them in multiple ilike comparisons.
Thanks
Sumit
From | Date | Subject | |
---|---|---|---|
Next Message | Chris Ernst | 2012-10-10 16:35:06 | Re: pg_upgrade not detecting version properly |
Previous Message | Joe Van Dyk | 2012-10-10 16:22:58 | Shorthand syntax for triggers |