From: | Merlin Moncure <mmoncure(at)gmail(dot)com> |
---|---|
To: | Jeff Janes <jeff(dot)janes(at)gmail(dot)com> |
Cc: | pgsql-hackers <pgsql-hackers(at)postgresql(dot)org> |
Subject: | Re: pg_trgm version 1.2 |
Date: | 2015-06-29 19:34:36 |
Message-ID: | CAHyXU0z1Ht8r276OMrZZ01_g3j+C6eey+1jFFkzwbV62v9bRcQ@mail.gmail.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
On Mon, Jun 29, 2015 at 7:23 AM, Merlin Moncure <mmoncure(at)gmail(dot)com> wrote:
> On Sat, Jun 27, 2015 at 5:17 PM, Jeff Janes <jeff(dot)janes(at)gmail(dot)com> wrote:
>> V1.1: Time: 1743.691 ms --- after repeated execution to warm the cache
>>
>> V1.2: Time: 2.839 ms --- after repeated execution to warm the cache
>
> Wow! I'm going to test this. I have some data sets for which trigram
> searching isn't really practical...if the search string touches
> trigrams with a lot of duplication the algorithm can have trouble
> beating brute force searches.
>
> trigram searching is important: it's the only way currently to search
> string encoded structures for partial strings quickly.
I ran your patch against stock 9.4 and am happy to confirm massive
speedups of pg_trgm; results of 90% reduction in runtime are common.
Also, with the new changes it's hard to get the indexed search to
significantly underperform brute force searching which is a huge
improvement vs the stock behavior, something that made me very wary of
using these kinds of searches in the past.
datatable: 'test2'
rows: ~ 2 million
heap size: 3.3GB (includes several unrelated fields)
index size: 1GB
9.4: stock
9.5: patched
match 50% rows, brute force seq scan
9.4: 11.5s
9.5: 9.1s
match 50% rows, indexed (time is quite variable with 9.4 giving > 40 sec times)
9.4: 21.0s
9.5: 11.8s
match 1% rows, indexed (>90% time reduction!)
9.4: .566s
9.5: .046s
match .1% rows, one selective one non-selective search term, selective
term first
9.4: .563s
9.5: .028s
match .1% rows, one selective one non-selective search term, selective term last
9.4: 1.014s
9.5: 0.093s
very nice! Recently, I examined pg_tgrm for an attribute searching
system -- it failed due to response time variability and lack of tools
to control that. Were your patch in place, I would have passed it. I
had a 'real world' data set though. With this, pg_trgm is basically
outperforming SOLR search engine for all cases we're interested in
whereas before low selectivity cases where having all kinds of
trouble.
merlin
From | Date | Subject | |
---|---|---|---|
Next Message | Andres Freund | 2015-06-29 19:48:40 | Re: Rework the way multixact truncations work |
Previous Message | Josh Berkus | 2015-06-29 17:40:56 | Re: Support for N synchronous standby servers - take 2 |