From: | Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> |
---|---|
To: | Jeff Janes <jeff(dot)janes(at)gmail(dot)com> |
Cc: | Justin Pryzby <pryzby(at)telsasoft(dot)com>, Vitaliy Garnashevich <vgarnashevich(at)gmail(dot)com>, pgsql-performance(at)lists(dot)postgresql(dot)org |
Subject: | Re: Bitmap scan is undercosted? - boolean correlation |
Date: | 2017-12-05 18:50:11 |
Message-ID: | 14438.1512499811@sss.pgh.pa.us |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers pgsql-performance |
Jeff Janes <jeff(dot)janes(at)gmail(dot)com> writes:
> On Dec 3, 2017 15:31, "Tom Lane" <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
>> Jeff Janes <jeff(dot)janes(at)gmail(dot)com> writes:
>>> But I do see that ties within the logical order of the column values are
>>> broken to agree with the physical order. That is wrong, right? Is there
>>> any argument that this is desirable?
>> Uh ... what do you propose doing instead? We'd have to do something with
>> ties, and it's not so obvious this way is wrong.
> Let them be tied. If there are 10 distinct values, number the values 0 to
> 9, and all rows of a given distinct values get the same number for the
> logical order axis.
> Calling the correlation 0.8 when it is really 0.0 seems obviously wrong to
> me. Although if we switched btree to store duplicate values with tid as a
> tie breaker, then maybe it wouldn't be as obviously wrong.
I thought some more about this. What we really want the correlation stat
to do is help us estimate how randomly an index-ordered scan will access
the heap. If the values we've sampled are all unequal then there's no
particular issue. However, if we have some group of equal values, we
do not really know what order an indexscan will visit them in. The
existing correlation calculation is making the *most optimistic possible*
assumption, that such a group will be visited exactly in heap order ---
and that assumption isn't too defensible. IIRC, a freshly built b-tree
will behave that way, because the initial sort of a btree breaks ties
using heap TIDs; but we don't maintain that property during later
insertions. In any case, given that we do this calculation without regard
to any specific index, we can't reasonably expect to model exactly what
the index will do. It would be best to adopt some middle-of-the-road
assumption about what the heap visitation order will be for a set of
duplicate values: not exactly heap order, but I think we should not use
a worst-case assumption either, since the btree may retain some amount
of its initial ordering.
BTW, I disagree that "correlation = zero" is the right answer for this
particular example. If the btree is freshly built, then an index-order
scan would visit all the heap pages in sequence to fetch "f" rows, and
then visit them all in sequence again to fetch "t" rows, which is a whole
lot better than the purely random access that zero correlation implies.
So I think 0.8 or so is actually a perfectly reasonable answer when the
index is fresh. The trouble is just that it'd behoove us to derate that
answer somewhat for the probability that the index isn't fresh.
My first thought for a concrete fix was to use the mean position of
a group of duplicates for the purposes of the correlation calculation,
but on reflection that's clearly wrong. We do have an idea, from the
data we have, whether the duplicates are close together in the heap
or spread all over. Using only mean position would fail to distinguish
those cases, but really we'd better penalize the spread-all-over case.
I'm not sure how to do that.
Or we could leave this calculation alone and try to move towards keeping
equal values in TID order in btrees. Not sure about the downsides of
that, though.
regards, tom lane
From | Date | Subject | |
---|---|---|---|
Next Message | Tom Lane | 2017-12-05 18:55:36 | Re: Usage of epoch in txid_current |
Previous Message | Robert Haas | 2017-12-05 18:48:50 | Re: explain analyze output with parallel workers - question about meaning of information for explain.depesz.com |
From | Date | Subject | |
---|---|---|---|
Next Message | Aaron Werman | 2017-12-06 01:59:35 | Re: Half billion records in one table? RDS |
Previous Message | Rodrigo Rosenfeld Rosas | 2017-12-05 18:17:35 | Re: Extremely slow DELETE with cascade foreign keys |