Re: Fix gin index cost estimation

From: Alexander Korotkov <aekorotkov(at)gmail(dot)com>
To: Ronan Dunklau <ronan(dot)dunklau(at)aiven(dot)io>
Cc: pgsql-hackers(at)lists(dot)postgresql(dot)org, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Michael Paquier <michael(at)paquier(dot)xyz>
Subject: Re: Fix gin index cost estimation
Date: 2023-01-08 11:45:35
Message-ID: CAPpHfdvYCtwxTBNmvz6_MDNCDDBBNW6n2VzE3=Wy0m6pEvCd+g@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Tue, Dec 6, 2022 at 1:22 PM Ronan Dunklau <ronan(dot)dunklau(at)aiven(dot)io> wrote:
> Le vendredi 2 décembre 2022, 13:58:27 CET Ronan Dunklau a écrit :
> > Le vendredi 2 décembre 2022, 12:33:33 CET Alexander Korotkov a écrit :
> > > Hi, Ronan!
> > > Thank you for your patch. Couple of quick questions.
> > > 1) What magic number 50.0 stands for? I think we at least should make
> > > it a macro.
> >
> > This is what is used in other tree-descending estimation functions, so I
> > used that too. Maybe a DEFAULT_PAGE_CPU_COST macro would work for both ? If
> > so I'll separate this into two patches, one introducing the macro for the
> > other estimation functions, and this patch for gin.
>
> The 0001 patch does this.
>
> >
> > > 2) "We only charge one data page for the startup cost" – should this
> > > be dependent on number of search entries?
>
> In fact there was another problem. The current code estimate two different
> pathes for fetching data pages: in the case of a partial match, it takes into
> account that all the data pages will have to be fetched. So this is is now
> taken into account for the CPU cost as well.
>
> For the regular search, we scale the number of data pages by the number of
> search entries.

Now the patch looks good for me. I made some tests.

# create extension pg_trgm;
# create extension btree_gin;
# create table test1 as (select random() as val from
generate_series(1,1000000) i);
# create index test1_gin_idx on test1 using gin (val);

# explain select * from test1 where val between 0.1 and 0.2;
QUERY PLAN
---------------------------------------------------------------------------------------------
Bitmap Heap Scan on test1 (cost=1186.21..7089.57 rows=98557 width=8)
Recheck Cond: ((val >= '0.1'::double precision) AND (val <=
'0.2'::double precision))
-> Bitmap Index Scan on test1_gin_idx (cost=0.00..1161.57
rows=98557 width=0)
Index Cond: ((val >= '0.1'::double precision) AND (val <=
'0.2'::double precision))
(4 rows)

# create index test1_btree_idx on test1 using btree (val);
# explain select * from test1 where val between 0.1 and 0.2;
QUERY PLAN
-----------------------------------------------------------------------------------------
Index Only Scan using test1_btree_idx on test1 (cost=0.42..3055.57
rows=98557 width=8)
Index Cond: ((val >= '0.1'::double precision) AND (val <=
'0.2'::double precision))
(2 rows)

Looks reasonable. In this case GIN is much more expensive, because it
can't handle range query properly and overlaps two partial matches.

# create table test2 as (select 'samplestring' || i as val from
generate_series(1,1000000) i);
# create index test2_gin_idx on test2 using gin (val);
# explain select * from test2 where val = 'samplestring500000';
QUERY PLAN
-----------------------------------------------------------------------------
Bitmap Heap Scan on test2 (cost=20.01..24.02 rows=1 width=18)
Recheck Cond: (val = 'samplestring500000'::text)
-> Bitmap Index Scan on test2_gin_idx (cost=0.00..20.01 rows=1 width=0)
Index Cond: (val = 'samplestring500000'::text)
(4 rows)

# create index test2_btree_idx on test2 using btree (val);
# explain select * from test2 where val = 'samplestring500000';
QUERY PLAN
-----------------------------------------------------------------------------------
Index Only Scan using test2_btree_idx on test2 (cost=0.42..4.44
rows=1 width=18)
Index Cond: (val = 'samplestring500000'::text)
(2 rows)

This also looks reasonable. GIN is not terribly bad for this case,
but B-tree is much cheaper.

I'm going to push this and backpatch to all supported versions if no objections.

------
Regards,
Alexander Korotkov

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Dean Rasheed 2023-01-08 12:28:02 MERGE ... RETURNING
Previous Message David Rowley 2023-01-08 11:03:54 Re: Todo: Teach planner to evaluate multiple windows in the optimal order