From: | Emre Hasegeli <emre(at)hasegeli(dot)com> |
---|---|
To: | David Rowley <david(dot)rowley(at)2ndquadrant(dot)com> |
Cc: | Alvaro Herrera <alvherre(at)2ndquadrant(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org> |
Subject: | Re: BRIN cost estimate |
Date: | 2017-04-06 08:01:01 |
Message-ID: | CAE2gYzz4FhRV17090-mbFXx8XekPnCHk1qD71q5swC_hX6ePZw@mail.gmail.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
> Good point. That's wrong, but I'm confused at why you kept the:
>
> + *indexTotalCost += selec * numTuples * cpu_index_tuple_cost;
>
> at all if that's the case. All the BRIN scan is going to do is build a
> bitmap of the matching ranges found.
My mind was not clear when I was working on it a year ago.
I've ended up with:
>
> + /*
> + * Charge a small amount per range tuple which we expect to match to. This
> + * is meant to reflect the costs of manipulating the bitmap. The BRIN scan
> + * will set a bit for each page in the range when we find a matching
> + * range, so we must multiply the charge by the number of pages in the
> + * range.
> + */
> + *indexTotalCost += 0.1 * cpu_operator_cost * estimatedRanges *
> + pagesPerRange;
Isn't BRIN executes the operator only once per range? I think we can just
multiply cpu_operator_cost and estimatedRanges.
I also noticed that you were doing:
>
> + if (get_index_stats_hook &&
> + (*get_index_stats_hook) (root, index->indexoid, 1, &vardata))
>
> and
>
> + vardata.statsTuple = SearchSysCache3(STATRELATTINH,
> + ObjectIdGetDatum(index->indexoid),
> + Int16GetDatum(1),
>
> I wondered why you picked to hardcode that as 1, and I thought that's
> surely a bug. But then looking deeper it seems to be copied from
> btcostestimate(), which also seems to be a long-standing bug
> introduced in a536ed53bca40cb0d199824e358a86fcfd5db7f2. I'll go write
> a patch for that one now.
Yes, I copy-pasted from btcostestimate().
I do still have concerns about how the correlation value is used, and
> the fact that we use the highest value, but then the whole use of this
> value is far from perfect, and is just a rough indication of how
> things are.
I agree. I tried to document how incomplete it is on the comments I wrote
to help future developers improve the situation.
From | Date | Subject | |
---|---|---|---|
Next Message | Kyotaro HORIGUCHI | 2017-04-06 08:02:14 | Re: Interval for launching the table sync worker |
Previous Message | Kyotaro HORIGUCHI | 2017-04-06 07:50:56 | Re: [COMMITTERS] pgsql: Collect and use multi-column dependency stats |