From: | Artur Zakirov <a(dot)zakirov(at)postgrespro(dot)ru> |
---|---|
To: | Alvaro Herrera <alvherre(at)2ndquadrant(dot)com>, Teodor Sigaev <teodor(at)sigaev(dot)ru> |
Cc: | Jeff Janes <jeff(dot)janes(at)gmail(dot)com>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org> |
Subject: | Re: Fuzzy substring searching with the pg_trgm extension |
Date: | 2016-02-10 15:34:11 |
Message-ID: | 56BB5873.1020503@postgrespro.ru |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
On 02.02.2016 15:45, Artur Zakirov wrote:
> On 01.02.2016 20:12, Artur Zakirov wrote:
>>
>> I have changed the patch:
>> 1 - trgm2.data was corrected, duplicates were deleted.
>> 2 - I have added operators <<-> and <->> with GiST index supporting. A
>> regression test will pass only with the patch
>> http://www.postgresql.org/message-id/CAPpHfdt19FwQXarYjkzxb3oxmv-KAn3FLuZrooARE_U3H3CV9g@mail.gmail.com
>>
>>
>> 3 - the function substring_similarity() was renamed to
>> subword_similarity().
>>
>> But there is not a function substring_similarity_pos() yet. It is not
>> trivial.
>>
>
> Sorry, in the previous patch was a typo. Here is the fixed patch.
>
I have attached a new version of the patch. It fixes error of operators
<->> and %>:
- operator <->> did not pass the regression test in CentOS 32 bit (gcc
4.4.7 20120313).
- operator %> did not pass the regression test in FreeBSD 32 bit (gcc
4.2.1 20070831).
It was because of variable optimization by gcc.
In this patch pg_trgm documentation was corrected. Now operators were
wrote as %> and <->> (not <% and <<->).
There is a problem in adding the substring_similarity_pos() function. It
can bring additional overhead. Because we need to store characters
position including spaces in addition. Spaces between words are lost in
current implementation.
Does it actually need?
In conclusion, this patch introduces:
1 - functions:
- subword_similarity()
2 - operators:
- %>
- <->>
3 - GUC variables:
- pg_trgm.sml_limit
- pg_trgm.subword_limit
--
Artur Zakirov
Postgres Professional: http://www.postgrespro.com
Russian Postgres Company
Attachment | Content-Type | Size |
---|---|---|
pg_trgm_guc_v2.patch | text/x-patch | 8.8 KB |
pg_trgm_subword_v7.patch | text/x-patch | 112.8 KB |
From | Date | Subject | |
---|---|---|---|
Next Message | Fabien COELHO | 2016-02-10 15:37:13 | Re: extend pgbench expressions with functions |
Previous Message | Robert Haas | 2016-02-10 15:21:11 | Re: [PATCH] Refactoring of LWLock tranches |