From: | Jeff Janes <jeff(dot)janes(at)gmail(dot)com> |
---|---|
To: | Artur Zakirov <a(dot)zakirov(at)postgrespro(dot)ru> |
Cc: | pgsql-hackers <pgsql-hackers(at)postgresql(dot)org> |
Subject: | Re: Fuzzy substring searching with the pg_trgm extension |
Date: | 2015-12-27 05:12:05 |
Message-ID: | CAMkU=1wFCbcDL7WOrSuwCf=a41z9vU2F514+UDMwOEg+2FTCHw@mail.gmail.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
On Fri, Dec 18, 2015 at 11:43 AM, Artur Zakirov
<a(dot)zakirov(at)postgrespro(dot)ru> wrote:
> Hello.
>
> PostgreSQL has a contrib module named pg_trgm. It is used to the fuzzy text
> search. It provides some functions and operators for determining the
> similarity of the given texts using trigram matching.
>
> At the moment, in pg_trgm both the similarity function and the % operator
> match two strings expecting that they are similar entirely. But they give
> bad results if we want to find documents by a query which is substring of a
> document.
This is very interesting. I suspect the index will not be very useful
in cases where the full string is much larger than the substring,
because the limit will not be met often enough to rule out many rows
just based on the index data. I have a pretty good test case to see.
Can you update the patch to incorporate the recent changes committed
under the thread "Patch: pg_trgm: gin index scan performance for
similarity search"? They conflict with your changes.
Thanks,
Jeff
From | Date | Subject | |
---|---|---|---|
Next Message | Vladimir Sitnikov | 2015-12-27 06:44:38 | Re: [POC] FETCH limited by bytes. |
Previous Message | Jeff Janes | 2015-12-27 04:44:04 | Re: WIP: Covering + unique indexes. |