From: | Hamid Akhtar <hamid(dot)akhtar(at)gmail(dot)com> |
---|---|
To: | Heikki Linnakangas <hlinnaka(at)iki(dot)fi> |
Cc: | PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org> |
Subject: | Re: Improved Cost Calculation for IndexOnlyScan |
Date: | 2020-09-29 08:49:23 |
Message-ID: | CANugjhsn+LCMdUxB6KsEiGSh9+0+9Aw8YDFW3HZAyhuv5ebj8g@mail.gmail.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
On Tue, Sep 29, 2020 at 1:08 PM Heikki Linnakangas <hlinnaka(at)iki(dot)fi> wrote:
> On 29/09/2020 10:06, Hamid Akhtar wrote:
> > In one of my earlier emails [1], I mentioned that there seems to be a
> > problem with how the cost for index only scans is being calculated.
> > [1]
> >
> https://www.postgresql.org/message-id/CANugjhsnh0OBMOYc7qKcC%2BZsVvAXCeF7QiidLuFvg6zmHy1C7A%40mail.gmail.com
> >
> > My concern is that there seems to be a bigger disconnect between the
> > cost of index only scan and the execution time. Having tested this on 3
> > different systems, docker, laptop and a server with RAID 5 SSD
> > configured, at the threshold where index only scan cost exceeds that of
> > sequential scan, index only is still around 30% faster than the
> > sequential scan.
>
> A 30% discrepancy doesn't sound too bad, to be honest. The exact
> threshold depends on so many factors.
>
> > My initial hunch was that perhaps we need to consider a different
> > approach when considering cost for index only scan. However, the
> > solution seems somewhat simple.
> >
> > cost_index function in costsize.c, in case of indexonlyscan, multiplies
> > the number of pages fetched by a factor of (1.0 - baserel->allvisfrac)
> > which is then used to calculate the max_IO_cost and min_IO_cost.
> >
> > This is very similar to the cost estimate methods for indexes internally
> > call genericostesimate function. This function primarily gets the number
> > of pages for the indexes and multiplies that with random page cost
> > (spc_random_page_cost) to get the total disk access cost.
> >
> > I believe that in case of index only scan, we should adjust the
> > spc_random_page_cost in context of baserel->allvisfrac so that it
> > accounts for random pages for only the fraction that needs to be read
> > for the relation and excludes that the index page fetches.
>
> That doesn't sound right to me. The genericcostestimate() function
> calculates the number of *index* pages fetched. It makes no difference
> if it's an Index Scan or an Index Only Scan.
>
> genericcostestimate() could surely be made smarter. Currently, it
> multiplies the number of index pages fetched with random_page_cost, even
> though a freshly created index is mostly physically ordered by the keys.
> seq_page_cost with some fudge factor might be more appropriate, whether
> or not it's an Index Only Scan. Not sure what the exact formula should
> be, just replacing random_page_cost with seq_page_cost is surely not
> right either.
>
> - Heikki
>
So, not actually random replacement here, rather a change with
baserel->allvisfrac taken into consideration (as given below):
----
index_random_page_cost = Min(spc_seq_page_cost + spc_random_page_cost *
(1.0 - baserel->allvisfrac), spc_random_page_cost);
----
Does this make sense?
--
Highgo Software (Canada/China/Pakistan)
URL : www.highgo.ca
ADDR: 10318 WHALLEY BLVD, Surrey, BC
CELL:+923335449950 EMAIL: mailto:hamid(dot)akhtar(at)highgo(dot)ca
SKYPE: engineeredvirus
From | Date | Subject | |
---|---|---|---|
Next Message | Alexander Korotkov | 2020-09-29 09:03:06 | Re: Fix inconsistency in jsonpath .datetime() |
Previous Message | tsunakawa.takay@fujitsu.com | 2020-09-29 08:28:03 | Disable WAL logging to speed up data loading |