disabling seqscan not using primary key index?

From: Luca Ferrari <fluca1978(at)gmail(dot)com>
To: pgsql-general <pgsql-general(at)lists(dot)postgresql(dot)org>
Subject: disabling seqscan not using primary key index?
Date: 2021-05-15 12:15:02
Message-ID: CAKoxK+6P38HkoQYXs_Kw0hOJ4sdTe7A34n=oY2EkspQKHxw4cQ@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-general

Hi all,
doing a little and trivial experiment, I decided to populate a table
with a primary key (and thus an automatically generated btree index):

testdb=# create table foo( pk serial primary key, i int );
CREATE TABLE
testdb=# insert into foo( i ) select v from generate_series( 1, 1000000 ) v;
INSERT 0 1000000
testdb=# explain select * from foo where i > 10;
QUERY PLAN
------------------------------------------------------------
Seq Scan on foo (cost=0.00..16925.00 rows=999900 width=8)
Filter: (i > 10)
(2 rows)

So far so good, but if I disable seqscan I would expect the planner to
choose the primary key index, because that "should be" the preferred
way to access the table.
On the other hand, the planner enables JIT machinery and executes
again a seqsca.

testdb=# set enable_seqscan to off;
SET
testdb=# explain select * from foo where i > 10;
QUERY PLAN
-------------------------------------------------------------------------------
Seq Scan on foo (cost=10000000000.00..10000016925.00 rows=999900 width=8)
Filter: (i > 10)
JIT:
Functions: 2
Options: Inlining true, Optimization true, Expressions true, Deforming true
(5 rows)

Now, I know the planner has done things right, and thus has selected
the minor cost plan for the situation, but I would like to better
understand why the primary key index is excluded and why the JIT
machinery is used in this case.

I know that columns pk and i are pretty identitcal, and the same query
made against the column pk do exploits the index:

testdb=# explain select * from foo where pk > 10;
QUERY PLAN
-------------------------------------------------------------------------------
Bitmap Heap Scan on foo (cost=6243.89..14835.76 rows=333350 width=8)
Recheck Cond: (pk > 10)
-> Bitmap Index Scan on foo_pkey (cost=0.00..6160.55 rows=333350 width=0)
Index Cond: (pk > 10)
(4 rows)

and the final cost using the index is much lower than the cost of the
seqscan + JIT, that makes me think the index should have been used
also in the previous query.
Any suggestion to better understand?

Please note that even with an extended statistic, the planner does not
choose to use an index:

testdb=# create statistics st_foo ( dependencies ) on pk,i from foo;
CREATE STATISTICS
testdb=# analyze foo;
ANALYZE

testdb=# set enable_seqscan to off;
SET
testdb=# explain select * from foo where i > 10;
QUERY PLAN
-------------------------------------------------------------------------------
Seq Scan on foo (cost=10000000000.00..10000016925.00 rows=999900 width=8)
Filter: (i > 10)
JIT:
Functions: 2
Options: Inlining true, Optimization true, Expressions true, Deforming true
(5 rows)

Even doing a quick computation about how much does it cost to get from
index to the table for all tuples seems to me much lower than the
initial cost for a not-sequentially-enabled cost:

testdb=# select sum( relpages + reltuples * current_setting(
'cpu_tuple_cost' )::numeric ) from pg_class where relname in ( 'foo',
'foo_pkey' );
sum
-------
27170

I'm clearly missing something here.

Thanks,
Luca

Responses

Browse pgsql-general by date

  From Date Subject
Next Message David Rowley 2021-05-15 14:40:07 Re: disabling seqscan not using primary key index?
Previous Message frank picabia 2021-05-14 17:14:34 Empty tablespace directory after restore with psql -d dbname -f dump.sql