Re: BRIN cost estimate

From: David Rowley <david(dot)rowley(at)2ndquadrant(dot)com>
To: Alvaro Herrera <alvherre(at)2ndquadrant(dot)com>
Cc: Emre Hasegeli <emre(at)hasegeli(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: BRIN cost estimate
Date: 2017-03-13 01:19:05
Message-ID: CAKJS1f9_Y9cy_n4VMmite_-2zeayyM+mdjpvBKMJ1fg0-4MMXQ@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On 2 March 2017 at 04:40, Alvaro Herrera <alvherre(at)2ndquadrant(dot)com> wrote:
> Alvaro Herrera wrote:
>
>> I have added this patch to the commitfest, which I've been intending to
>> get in for a long time. I'll be submitting an updated patch, if needed.
>
> Here is Emre's patch rebased to current sources.

Looking over this now, and have noticed a couple of things:

1.

+ Assert(nnumbers == 1);

I think its a bad idea to Assert() this. The stat tuple can come from
a plugin which could do anything. Seems like if we need to be certain
of that then it should be an elog(ERROR), maybe mention that we
expected a 1 element array, but got <nnumbers> elements.

2.

+ Assert(varCorrelation >= 0 && varCorrelation <= 1);

same goes for that. I don't think we want to Assert() that either.
elog(ERROR) seems better, or perhaps we should fall back on the old
estimation method in this case?

The Asserted range also seems to contradict the range mentioned in
pg_statistic.h:

/*
* A "correlation" slot describes the correlation between the physical order
* of table tuples and the ordering of data values of this column, as seen
* by the "<" operator identified by staop. (As with the histogram, more
* than one entry could theoretically appear.) stavalues is not used and
* should be NULL. stanumbers contains a single entry, the correlation
* coefficient between the sequence of data values and the sequence of
* their actual tuple positions. The coefficient ranges from +1 to -1.
*/
#define STATISTIC_KIND_CORRELATION 3

3.

+ numBlocks = 1 + baserel->pages / BrinGetPagesPerRange(indexRel);

should numBlocks be named numRanges? After all we call the option
"pages_per_range".

4.

+ * correlated table is copied 4 times, the correlation would be 0.25,
+ * although the index would be almost as good as the version on the

What's meant by "table is copied 4 times" ?

As best as I can work out, it means.

INSERT INTO t SELECT * FROM t;
INSERT INTO t SELECT * FROM t;

but I'm uncertain, and it's not very clear to me what it means.

5.

+ blockSelectivity = (blockProportion + 1 - *indexCorrelation) / 2;

I missed the comment that explains why we divide by two here.

6.

Might want to consider not applying this estimate when the statistics
don't contain any STATISTIC_KIND_CORRELATION array.

In this case the estimation is still applied the same way, only the
*indexCorrelation is 0.

Consider:

CREATE TABLE r2 (values INT NOT NULL);
INSERT INTO r2 VALUES(1);
ANALYZE r2;
\x on
SELECT * FROM pg_statistic WHERE starelid = (SELECT oid FROM pg_class
WHERE relname = 'r2');

Notice the lack of any stakind being set to 3.

7.

+ blockProportion = (double) BrinGetPagesPerRange(indexRel) /
+ baserel->pages;

Perhaps the blockProportion should also be clamped to the number of
pages in the relation. Even a 1 page relation is estimated to have 128
pages with the default pages_per_range, by the method used in the
patch.

blockProportion = Max(blockProportion, baserel->relpages);

during my test the selectivity was set to 64.5, then clamped to 1 by
CLAMP_PROBABILITY(). This does not seem very nice.

Good news is, I'm seeing much better plans coming out in cases when
the index is unlikely to help. So +1 for fixing this issue.

Will Emre be around to make the required changes to the patch? I see
it's been a while since it was originally posted.

--
David Rowley http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Neha Khatri 2017-03-13 01:22:06 Why is there a doubtful copyObject call in add_vars_to_targetlist
Previous Message Masahiko Sawada 2017-03-13 00:59:21 Re: Transactions involving multiple postgres foreign servers