From: | Heikki Linnakangas <hlinnakangas(at)vmware(dot)com> |
---|---|
To: | Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> |
Cc: | Alexander Korotkov <aekorotkov(at)gmail(dot)com>, Jeff Davis <pgsql(at)j-davis(dot)com>, Andres Freund <andres(at)2ndquadrant(dot)com>, Alvaro Herrera <alvherre(at)2ndquadrant(dot)com>, pgsql-hackers(at)postgresql(dot)org |
Subject: | Re: gistchoose vs. bloat |
Date: | 2013-01-24 20:53:21 |
Message-ID: | 51019F41.1060204@vmware.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
On 24.01.2013 22:35, Tom Lane wrote:
> Alexander Korotkov<aekorotkov(at)gmail(dot)com> writes:
>> There is another cause of overhead when use randomization in gistchoose:
>> extra penalty calls. It could be significant when index fits to cache. In
>> order evade it I especially change behaviour of my patch from "look
>> sequentially and choose random" to "look in random order". I think we need
>> to include comparison of CPU time.
>
> Hmm ... actually, isn't that an argument in favor of Heikki's method?
> The way he's doing it, we can exit without making additional penalty
> calls once we've found a zero-penalty tuple and decided not to look
> further (which is something with a pretty high probability).
No, I think Alexander is right, although I believe the difference is
minimal in practice.
If we assume that the there are no zero-penalty tuples on the page, with
both approaches you have to call penalty on every tuple to know which is
best. If there are zero-penalty tuples, then there is a small
difference. With Alexander's method, you can stop looking as soon as you
find a zero-penalty tuple (the order you check the tuples is random).
With my method, you can stop looking as soon as you find a zero-penalty
tuple, *and* you decide to not look further. With the 1/2 probability to
stop looking further, you give up on average after 2 tuples.
So if I'm doing my math right, my patch does on average between 1x - 2x
as many penalty calls as Alexander's, depending on how many zero-penalty
tuples there are. The 2x case is when the page is full of zero-penalty
tuples, in which case the difference isn't big in absolute terms; 2
penalty calls per page versus 1.
BTW, one thing that I wondered about this: How expensive is random()?
I'm assuming not very, but I don't really know. Alexander's patch called
random() for every tuple on the page, while I call it only once for each
equal-penalty tuple. If it's really cheap, I think my/Tom's patch could
be slightly simplified by always initializing keep_current_best with
random() at top of the function, and eliminating the -1 "haven't decided
yet" state. OTOH, if random() is expensive, I note that we only need one
bit of randomness, so we could avoid calling random() so often if we
utilized all the bits from each random() call.
- Heikki
From | Date | Subject | |
---|---|---|---|
Next Message | Bruce Momjian | 2013-01-24 20:54:52 | Re: text search: restricting the number of parsed words in headline generation |
Previous Message | Noah Misch | 2013-01-24 20:49:54 | Re: Materialized views WIP patch |