From: | Alexander Korotkov <a(dot)korotkov(at)postgrespro(dot)ru> |
---|---|
To: | Jeff Janes <jeff(dot)janes(at)gmail(dot)com> |
Cc: | pgsql-hackers <pgsql-hackers(at)postgresql(dot)org> |
Subject: | Re: pg_trgm version 1.2 |
Date: | 2015-07-20 14:17:49 |
Message-ID: | CAPpHfdskHSDixhh6-95vDefNwUuwTRYwnNs0xfRCE6BtSxUw+Q@mail.gmail.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
On Wed, Jul 15, 2015 at 12:31 AM, Jeff Janes <jeff(dot)janes(at)gmail(dot)com> wrote:
> On Tue, Jul 7, 2015 at 6:33 AM, Alexander Korotkov <
> a(dot)korotkov(at)postgrespro(dot)ru> wrote:
>
>>
>>
>> See Tom Lane's comment about downgrade scripts. I think just remove it is
>> a right solution.
>>
>
> The new patch removes the downgrade path and the ability to install the
> old version.
>
> (If anyone wants an easy downgrade path for testing, they can keep using
> the prior patch--no functional changes)
>
> It also added a comment before the trigramsMatchGraph call.
>
> I retained the palloc and the loop to promote the ternary array to a
> binary array. While I also think it is tempting to get rid of that by
> abusing the type system and would do it that way in my own standalone code,
> it seems contrary to way the project usually does things. And I couldn't
> measure a difference in performance.
>
Ok!
> ....
>
>
>
>> Let's consider '^(?!.*def).*abc' regular expression as an example. It
>> matches strings which contains 'abc' and don't contains 'def'.
>>
>> # SELECT 'abc' ~ '^(?!.*def).*abc';
>> ?column?
>> ----------
>> t
>> (1 row)
>>
>> # SELECT 'def abc' ~ '^(?!.*def).*abc';
>> ?column?
>> ----------
>> f
>> (1 row)
>>
>> # SELECT 'abc def' ~ '^(?!.*def).*abc';
>> ?column?
>> ----------
>> f
>> (1 row)
>>
>> Theoretically, our trigram regex processing could support negative
>> matching of 'def' trigram, i.e. trigramsMatchGraph(abc = true, def = false)
>> = true but trigramsMatchGraph(abc = true, def = true) = false. Actually, it
>> doesn't because trigramsMatchGraph() implements a monotonic function. I
>> just think it should be stated explicitly.
>>
>
> Do you think it is likely to change to stop being monotonic and so support
> the (def=GIN_TRUE) => false case?
>
> ^(?!.*def) seems like a profoundly niche situation. (Although one that
> I might actually start using myself now that I know it isn't just a
> Perl-ism).
>
> It doesn't make any difference to this patch, other than perhaps how to
> word the comments.
>
Yes, it was just about the comments.
I also run few tests on real-life dataset: set of dblp paper titles. As it
was expected, there is huge speedup when pattern is long enough.
# SELECT count(*) FROM dblp_titles WHERE s ILIKE '%Multidimensional Data%';
count
-------
318
(1 row)
Time: 29,114 ms
# SELECT count(*) FROM dblp_titles WHERE s ~* '.*Multidimensional Data.*';
count
-------
318
(1 row)
Time: 26,273 ms
# SELECT count(*) FROM dblp_titles WHERE s ILIKE '%Multidimensional Data%';
count
-------
318
(1 row)
Time: 249,725 ms
# SELECT count(*) FROM dblp_titles WHERE s ~* '.*Multidimensional Data.*';
count
-------
318
(1 row)
Time: 218,627 ms
For me, patch is in the pretty good shape. I'm going to mark it "Ready for
committer".
------
Alexander Korotkov
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company
From | Date | Subject | |
---|---|---|---|
Next Message | Merlin Moncure | 2015-07-20 14:53:29 | Re: First Aggregate Funtion? |
Previous Message | Heikki Linnakangas | 2015-07-20 14:15:02 | Re: All-zero page in GIN index causes assertion failure |