From: | Oleg Bartunov <oleg(at)sai(dot)msu(dot)su> |
---|---|
To: | Guillaume Smet <guillaume(dot)smet(at)gmail(dot)com> |
Cc: | pgsql-performance(at)postgresql(dot)org |
Subject: | Re: pg_trgm performance |
Date: | 2007-02-26 12:53:16 |
Message-ID: | Pine.LNX.4.64.0702261552120.400@sn.sai.msu.ru |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-performance |
On Mon, 26 Feb 2007, Guillaume Smet wrote:
> On 2/24/07, Steinar H. Gunderson <sgunderson(at)bigfoot(dot)com> wrote:
>
> Thanks for your time.
>
>> GiN version, short:
>> -> Bitmap Heap Scan on tags (cost=8.64..151.79 rows=41 width=0)
>> (actual time=5.555..30.157 rows=7 loops=1)
>> Filter: (title % 'foo'::text)
>> -> Bitmap Index Scan on trgm_idx (cost=0.00..8.63 rows=41
>> width=0) (actual time=2.857..2.857 rows=5555 loops=1)
>> Index Cond: (title % 'foo'::text)
>
> This is currently the worst case in the gist - gin comparison because
> in the index scan, gin version doesn't have the length of the indexed
> string. So it returns a lot of rows which have every trigram of your
> search string but has in fact a low similarity due to the length of
> the indexed string (5555 rows -> 7 rows).
> It cannot be fixed at the moment due to the way GIN indexes work.
>
>> So, the GiN version seems to be a bit faster for long queries, but it's
>> still
>> too slow -- in fact, _unindexed_ versions give 141ms, 342ms, 725ms for
>> these
>> three queries, so for the longer queries, the gain is only about a factor
>> two. (By the way, I would like to stress that this is not my personal music
>> collection! :-P)
>
> The fact is that pg_trgm is designed to index words and not to index
> long sentences. I'm not that surprised it's slow in your case.
>
> It's also my case but following the instructions in README.pg_trgm I
> created a dictionary of words using tsearch2 (stat function) and I use
> pg_trgm on this dictionary to find similar words in my dictionary.
>
> For example, I rewrite the search:
> auberge cevenes
> as:
> (auberge | auberges | aubberge | auberg) & (ceven | cene | cevenol |
> cevennes)
> using pg_trgm and my query can find Auberge des C?vennes (currently
> it's limited to the 4th most similar words but I can change it
> easily).
Did you rewrite query manually or use rewrite feature of tsearch2 ?
>
> --
> Guillaume
>
> ---------------------------(end of broadcast)---------------------------
> TIP 2: Don't 'kill -9' the postmaster
>
Regards,
Oleg
_____________________________________________________________
Oleg Bartunov, Research Scientist, Head of AstroNet (www.astronet.ru)
Sternberg Astronomical Institute, Moscow University, Russia
Internet: oleg(at)sai(dot)msu(dot)su, http://www.sai.msu.su/~megera/
phone: +007(495)939-16-83, +007(495)939-23-83
From | Date | Subject | |
---|---|---|---|
Next Message | Guillaume Smet | 2007-02-26 12:58:20 | Re: pg_trgm performance |
Previous Message | Guillaume Smet | 2007-02-26 12:42:40 | Re: pg_trgm performance |