Re: [EXT] Re: Improve "select count(*)" query - takes more than 30 mins for some large tables

From: Mladen Gogala <gogala(dot)mladen(at)gmail(dot)com>
To: MichaelDBA Vitale <michaeldba(at)sqlexec(dot)com>, "Pierson Patricia L (Contractor)" <Patricia(dot)L(dot)Pierson(at)irs(dot)gov>
Cc: "pgsql-admin(at)lists(dot)postgresql(dot)org" <pgsql-admin(at)lists(dot)postgresql(dot)org>
Subject: Re: [EXT] Re: Improve "select count(*)" query - takes more than 30 mins for some large tables
Date: 2022-07-13 02:54:50
Message-ID: d56ab3ae-baee-c93d-fc0d-b55356b2729a@gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-admin

On 7/12/22 14:25, MichaelDBA Vitale wrote:
> That is not true: doing the select on the primary key will still
> result in a table scan, not an index scan.  The heap always gets
> accessed for select counts.
>
> Regards,
> Michael Vitale

Well, not always:

mgogala=# explain select count(*) from emp;
                        QUERY PLAN
----------------------------------------------------------
 Aggregate  (cost=1.18..1.19 rows=1 width=8)
   ->  Seq Scan on emp  (cost=0.00..1.14 rows=14 width=0)
(2 rows)

That is the classic "select count(*)".  Now, let's see PK:

mgogala=# \d emp
                           Table "mgogala.emp"
  Column  |            Type             | Collation | Nullable | Default
----------+-----------------------------+-----------+----------+---------
 empno    | smallint                    |           | not null |
 ename    | character varying(10)       |           |          |
 job      | character varying(9)        |           |          |
 mgr      | smallint                    |           |          |
 hiredate | timestamp without time zone |           |          |
 sal      | double precision            |           |          |
 comm     | double precision            |           |          |
 deptno   | smallint                    |           |          |
Indexes:
    "emp_pkey" PRIMARY KEY, btree (empno)
Foreign-key constraints:
    "fk_deptno" FOREIGN KEY (deptno) REFERENCES dept(deptno)

Now, let's do scan on "count(empno)":

mgogala=# explain select count(empno) from emp;
                        QUERY PLAN
----------------------------------------------------------
 Aggregate  (cost=1.18..1.19 rows=1 width=8)
   ->  Seq Scan on emp  (cost=0.00..1.14 rows=14 width=2)
(2 rows)

So far, so good. Sequential scan, despite the existence of the NOT NULL
primary key. Now comes the twist:

mgogala=# set enable_seqscan=off;
SET
mgogala=# explain select count(empno) from emp;
                                   QUERY PLAN

--------------------------------------------------------------------------------
-
 Aggregate  (cost=12.38..12.39 rows=1 width=8)
   ->  Index Only Scan using emp_pkey on emp (cost=0.14..12.35 rows=14
width=2)
(2 rows)

Voila, we've got index only "count" scan. So, in some cases Postgres
does do index only scan. What is surprising is that pg_hint_plan doesn't
produce the desired results:

mgogala=# explain /*+ IndexScan(emp emp_pkey) */ select count(empno)
from emp;
                                  QUERY PLAN
-------------------------------------------------------------------------------
 Aggregate  (cost=10000000001.17..10000000001.18 rows=1 width=8)
   ->  Seq Scan on emp  (cost=10000000000.00..10000000001.14 rows=14
width=2)
 JIT:
   Functions: 3
   Options: Inlining true, Optimization true, Expressions true,
Deforming true
(5 rows)

This is strange in more ways than one:

1. The hint didn't work, Postgres still performed sequential scan
2. JIT got involved for the reasons that are completely unclear.
3. The cost is enormous, despite the fact the the "emp" table has only
14 rows and is fully contained in a single block.

However, if we do GUC hint, the result behaves as expected:

mgogala=# explain /*+ set(enable_seqscan off) */ select count(empno)
from emp e;
                                    QUERY PLAN

--------------------------------------------------------------------------------
---
 Aggregate  (cost=12.38..12.39 rows=1 width=8)
   ->  Index Only Scan using emp_pkey on emp e (cost=0.14..12.35
rows=14 width=2)
(2 rows)

So, it is possible to force index only scan. However, from purely
theoretic point of view, the speed difference is questionable. Indexes
are stored differently than tables, they have branch blocks and data
bocks with an ample free space in the data blocks to prevent block
splits and index level increase, with mandatory re-balancing. Indexes
are actually quite large structures, sometimes even larger than the
underlying tables. In general, going after the index-only scan doesn't
look like a good strategy.

After all this work, I also have a question: does PostgreSQL read index
blocks in batches or it does single block scans? Some other databases
have 2 events: "db file scattered read" for sequential scan (called
"full table scan" in that specific vernacular) and "db file sequential
read" for index scans. Those database only do single block reads from
indexes while the tables can be read using batches of blocks. Does
PostgreSQL do the same thing?

--
Mladen Gogala
Database Consultant
Tel: (347) 321-1217
https://dbwhisperer.wordpress.com

In response to

Browse pgsql-admin by date

  From Date Subject
Next Message Ron 2022-07-13 13:16:27 Re: [EXT] Improve "select count(*)" query - takes more than 30 mins for some large tables
Previous Message Scott Ribe 2022-07-13 02:28:42 Re: [EXT] Improve "select count(*)" query - takes more than 30 mins for some large tables