From: | Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> |
---|---|
To: | Alexander Korotkov <aekorotkov(at)gmail(dot)com> |
Cc: | Erik Rijkers <er(at)xs4all(dot)nl>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org> |
Subject: | Re: trgm regex index peculiarity |
Date: | 2014-04-06 00:52:09 |
Message-ID: | 29409.1396745529@sss.pgh.pa.us |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
Alexander Korotkov <aekorotkov(at)gmail(dot)com> writes:
> Next revision of patch is attached. Changes are so:
> 1) Notion "penalty" is used instead of "size".
> 2) We try to reduce total penalty to WISH_TRGM_PENALTY, but restriction is
> MAX_TRGM_COUNT total trigrams count.
> 3) Penalties are assigned to particular color trigram classes. I.e.
> separate penalties for __a, _aa, _a_, aa_. It's based on analysis of
> trigram frequencies in Oscar Wilde writings. We can end up with different
> numbers, but I don't think they will be dramatically different.
Committed with cosmetic improvements (adjusting the comments mostly).
The new whitespace penalties look reasonably sane to me. I wonder though
if WISH_TRGM_PENALTY is too small --- it seems like this code will tend to
select many fewer trigrams than the old code did. What testing did you do
that led you to select the specific value of 16?
regards, tom lane
From | Date | Subject | |
---|---|---|---|
Next Message | Amit Kapila | 2014-04-06 06:15:59 | Re: [BUG FIX] Compare returned value by socket() against PGINVALID_SOCKET instead of < 0 |
Previous Message | Alvaro Herrera | 2014-04-06 00:22:40 | Re: Another assert failure from no-palloc-in-critical-sections |