From: | Alexander Korotkov <aekorotkov(at)gmail(dot)com> |
---|---|
To: | Robert Haas <robertmhaas(at)gmail(dot)com> |
Cc: | Erik Rijkers <er(at)xs4all(dot)nl>, pgsql-hackers(at)postgresql(dot)org |
Subject: | Re: Patch: add conversion from pg_wchar to multibyte |
Date: | 2012-05-01 22:02:23 |
Message-ID: | CAPpHfdsfg7vcanUBRPJBzPJ5jETVw2sH5LBwpeac=R_C74QTag@mail.gmail.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
On Mon, Apr 30, 2012 at 10:07 PM, Robert Haas <robertmhaas(at)gmail(dot)com> wrote:
> On Sun, Apr 29, 2012 at 8:12 AM, Erik Rijkers <er(at)xs4all(dot)nl> wrote:
> > Perhaps I'm too early with these tests, but FWIW I reran my earlier test
> program against three
> > instances. (the patches compiled fine, and make check was without
> problem).
>
> These tests results seem to be more about the pg_trgm changes than the
> patch actually on this thread, unless I'm missing something. But the
> executive summary seems to be that pg_trgm might need to be a bit
> smarter about costing the trigram-based search, because when the
> number of trigrams is really big, using the index is
> counterproductive. Hopefully that's not too hard to fix; the basic
> approach seems quite promising.
Right. When number of trigrams is big, it is slow to scan posting list of
all of them. The solution is this case is to exclude most frequent trigrams
from index scan. But, it require some kind of statistics of trigrams
frequencies which we don't have. We could estimate frequencies using some
hard-coded assumptions about natural languages. Or we could exclude
arbitrary trigrams. But I don't like both these ideas. This problem is also
relevant for LIKE/ILIKE search using trigram indexes.
Something similar could occur in tsearch when we search for "frequent_term
& rare_term". In some situations (depending on terms frequencies) it's
better to exclude frequent_term from index scan and do recheck. We have
relevant statistics to do such decision, but it doesn't seem to be feasible
to get it using current GIN interface.
Probably you have some comments on idea of conversion from pg_wchar to
multibyte? Is it acceptable at all?
------
With best regards,
Alexander Korotkov.
From | Date | Subject | |
---|---|---|---|
Next Message | Alexander Korotkov | 2012-05-01 22:08:30 | Re: Patch: add conversion from pg_wchar to multibyte |
Previous Message | Alexander Korotkov | 2012-05-01 21:45:57 | Re: Patch: add conversion from pg_wchar to multibyte |