From: | Alexander Korotkov <aekorotkov(at)gmail(dot)com> |
---|---|
To: | Ronan Dunklau <ronan(dot)dunklau(at)aiven(dot)io> |
Cc: | pgsql-hackers(at)lists(dot)postgresql(dot)org, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Michael Paquier <michael(at)paquier(dot)xyz> |
Subject: | Re: Fix gin index cost estimation |
Date: | 2023-01-08 11:45:35 |
Message-ID: | CAPpHfdvYCtwxTBNmvz6_MDNCDDBBNW6n2VzE3=Wy0m6pEvCd+g@mail.gmail.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
On Tue, Dec 6, 2022 at 1:22 PM Ronan Dunklau <ronan(dot)dunklau(at)aiven(dot)io> wrote:
> Le vendredi 2 décembre 2022, 13:58:27 CET Ronan Dunklau a écrit :
> > Le vendredi 2 décembre 2022, 12:33:33 CET Alexander Korotkov a écrit :
> > > Hi, Ronan!
> > > Thank you for your patch. Couple of quick questions.
> > > 1) What magic number 50.0 stands for? I think we at least should make
> > > it a macro.
> >
> > This is what is used in other tree-descending estimation functions, so I
> > used that too. Maybe a DEFAULT_PAGE_CPU_COST macro would work for both ? If
> > so I'll separate this into two patches, one introducing the macro for the
> > other estimation functions, and this patch for gin.
>
> The 0001 patch does this.
>
> >
> > > 2) "We only charge one data page for the startup cost" – should this
> > > be dependent on number of search entries?
>
> In fact there was another problem. The current code estimate two different
> pathes for fetching data pages: in the case of a partial match, it takes into
> account that all the data pages will have to be fetched. So this is is now
> taken into account for the CPU cost as well.
>
> For the regular search, we scale the number of data pages by the number of
> search entries.
Now the patch looks good for me. I made some tests.
# create extension pg_trgm;
# create extension btree_gin;
# create table test1 as (select random() as val from
generate_series(1,1000000) i);
# create index test1_gin_idx on test1 using gin (val);
# explain select * from test1 where val between 0.1 and 0.2;
QUERY PLAN
---------------------------------------------------------------------------------------------
Bitmap Heap Scan on test1 (cost=1186.21..7089.57 rows=98557 width=8)
Recheck Cond: ((val >= '0.1'::double precision) AND (val <=
'0.2'::double precision))
-> Bitmap Index Scan on test1_gin_idx (cost=0.00..1161.57
rows=98557 width=0)
Index Cond: ((val >= '0.1'::double precision) AND (val <=
'0.2'::double precision))
(4 rows)
# create index test1_btree_idx on test1 using btree (val);
# explain select * from test1 where val between 0.1 and 0.2;
QUERY PLAN
-----------------------------------------------------------------------------------------
Index Only Scan using test1_btree_idx on test1 (cost=0.42..3055.57
rows=98557 width=8)
Index Cond: ((val >= '0.1'::double precision) AND (val <=
'0.2'::double precision))
(2 rows)
Looks reasonable. In this case GIN is much more expensive, because it
can't handle range query properly and overlaps two partial matches.
# create table test2 as (select 'samplestring' || i as val from
generate_series(1,1000000) i);
# create index test2_gin_idx on test2 using gin (val);
# explain select * from test2 where val = 'samplestring500000';
QUERY PLAN
-----------------------------------------------------------------------------
Bitmap Heap Scan on test2 (cost=20.01..24.02 rows=1 width=18)
Recheck Cond: (val = 'samplestring500000'::text)
-> Bitmap Index Scan on test2_gin_idx (cost=0.00..20.01 rows=1 width=0)
Index Cond: (val = 'samplestring500000'::text)
(4 rows)
# create index test2_btree_idx on test2 using btree (val);
# explain select * from test2 where val = 'samplestring500000';
QUERY PLAN
-----------------------------------------------------------------------------------
Index Only Scan using test2_btree_idx on test2 (cost=0.42..4.44
rows=1 width=18)
Index Cond: (val = 'samplestring500000'::text)
(2 rows)
This also looks reasonable. GIN is not terribly bad for this case,
but B-tree is much cheaper.
I'm going to push this and backpatch to all supported versions if no objections.
------
Regards,
Alexander Korotkov
From | Date | Subject | |
---|---|---|---|
Next Message | Dean Rasheed | 2023-01-08 12:28:02 | MERGE ... RETURNING |
Previous Message | David Rowley | 2023-01-08 11:03:54 | Re: Todo: Teach planner to evaluate multiple windows in the optimal order |