Re: Supporting = operator in gin/gist_trgm_ops

From: David Rowley <dgrowleyml(at)gmail(dot)com>
To: Alexander Korotkov <aekorotkov(at)gmail(dot)com>
Cc: Georgios <gkokolatos(at)protonmail(dot)com>, Julien Rouhaud <julien(dot)rouhaud(at)free(dot)fr>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject: Re: Supporting = operator in gin/gist_trgm_ops
Date: 2024-09-17 04:42:09
Message-ID: CAApHDvp3W7G8Oo4=wjt0ceTbic35SHJ=qfoD_CnPXSnZVzCkhQ@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Sat, 14 Nov 2020 at 18:31, Alexander Korotkov <aekorotkov(at)gmail(dot)com> wrote:
> I also have checked that btree_gist is preferred over pg_trgm gist
> index for equality search. Despite our gist cost estimate is quite
> dumb, it selects btree_gist index due to its lower size. So, this
> part also looks good to me.
>
> I'm going to push this if no objections.

(Reviving old thread [1] due to a complaint from a customer about a
performance regression after upgrading PG13 to PG15)

I think the comparisons you made may have been too simplistic. Do you
recall what your test case was?

I tried comparing btree to gist with gist_trgm_ops and found that the
cost estimates for GIST come out cheaper than btree. Btree only wins
in the most simplistic tests due to Index Only Scans. The test done in
[2] seems to have fallen for that mistake.

create extension if not exists pg_trgm;
create table a (a varchar(250), b varchar(250), c varchar(250));
insert into a select md5(a::text),md5(a::text),md5(a::text) from
generate_Series(1,1000000)a;
create index a_a_btree on a (a);
create index a_a_gist on a using gist (a gist_trgm_ops);
vacuum freeze analyze a;

-- Gist index wins
explain (analyze, buffers) select * from a where a = '1234';

Index Scan using a_a_gist on a (cost=0.41..8.43 rows=1 width=99)
Index Cond: ((a)::text = '1234'::text)
Rows Removed by Index Recheck: 1
Buffers: shared hit=14477
Planning Time: 0.055 ms
Execution Time: 23.861 ms
(6 rows)

-- hack to disable gist index.
update pg_index set indisvalid = false where indexrelid='a_a_gist'::regclass;
explain (analyze, buffers) select * from a where a = '1234';

Index Scan using a_a_btree on a (cost=0.42..8.44 rows=1 width=99)
Index Cond: ((a)::text = '1234'::text)
Buffers: shared read=3
Planning:
Buffers: shared hit=8
Planning Time: 0.090 ms
Execution Time: 0.048 ms (497.1 times faster)
(7 rows)

-- re-enable gist.
update pg_index set indisvalid = true where indexrelid='a_a_gist'::regclass;

-- try a query that supports btree with index only scan. Btree wins.
explain (analyze, buffers) select a from a where a = '1234';

Index Only Scan using a_a_btree on a (cost=0.42..4.44 rows=1 width=33)
Index Cond: (a = '1234'::text)
Heap Fetches: 0
Buffers: shared read=3
Planning Time: 0.185 ms
Execution Time: 0.235 ms
(6 rows)

-- Disable IOS and Gist index wins again.
set enable_indexonlyscan=0;
explain (analyze, buffers) select a from a where a = '1234';

Index Scan using a_a_gist on a (cost=0.41..8.43 rows=1 width=33)
Index Cond: ((a)::text = '1234'::text)
Rows Removed by Index Recheck: 1
Buffers: shared hit=11564 read=3811
Planning Time: 0.118 ms
Execution Time: 71.270 ms (303.2 times faster)
(6 rows)

This does not seem ideal given that the select * with the btree is
~500 times faster than with the gist plan.

For now, I've recommended the GIST indexes are moved to another
tablespace with an increased random_page_cost to try to coax the
planner to use the btree index.

I wonder if we can do something to fix this so the different
tablespace idea isn't the permanent solution. I had a look to see why
the GIST costs come out cheaper. It looks like it's the startup cost
calculation that's slightly different from the btree costing. The
attached patch highlights the difference. When applied both indexes
come out at the same cost and which one is picked is down to which
index has the lower Oid. I've not studied if there's a reason why this
code is different in gist.

David

[1] https://postgr.es/m/CAPpHfducQ0U8noyb2L3VChsyBMsc5V2Ej2whmEuxmAgHa2jVXg@mail.gmail.com
[2] https://postgr.es/m/CAOBaU_YkkhakwTG4oA886T4CQsHG5hfY%2BxGA3dTBdZM%2BDTYJWA%40mail.gmail.com

Attachment Content-Type Size
gist_cost.diff application/octet-stream 598 bytes

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message shveta malik 2024-09-17 04:48:35 Re: Add contrib/pg_logicalsnapinspect
Previous Message shveta malik 2024-09-17 03:38:17 Re: Allow logical failover slots to wait on synchronous replication