Re: SELECT DISTINCT chooses parallel seqscan instead of indexscan on huge table with 1000 partitions

From: David Rowley <dgrowleyml(at)gmail(dot)com>
To: Dimitrios Apostolou <jimis(at)gmx(dot)net>
Cc: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, pgsql-general(at)lists(dot)postgresql(dot)org
Subject: Re: SELECT DISTINCT chooses parallel seqscan instead of indexscan on huge table with 1000 partitions
Date: 2024-05-11 01:35:44
Message-ID: CAApHDvrtTKfh7HgAyXBd3KN0s-jxiHzW7sWdm-sFEjP6fGPCkg@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-general

On Sat, 11 May 2024 at 13:11, Dimitrios Apostolou <jimis(at)gmx(dot)net> wrote:
> Indeed that's an awful estimate, the table has more than 1M of unique
> values in that column. Looking into pg_stat_user_tables, I can't see the
> partitions having been vacuum'd or analyzed at all. I think they should
> have been auto-analyzed, since they get a ton of INSERTs
> (no deletes/updates though) and I have the default autovacuum settings.
> Could it be that autovacuum starts, but never
> finishes? I can't find something in the logs.

It's not the partitions getting analyzed you need to worry about for
an ndistinct estimate on the partitioned table. It's auto-analyze or
ANALYZE on the partitioned table itself that you should care about.

If you look at [1], it says "Tuples changed in partitions and
inheritance children do not trigger analyze on the parent table."

> In any case, even after the planner decides to execute the terrible plan
> with the parallel seqscans, why doesn't it finish right when it finds 10
> distinct values?

It will. It's just that Sorting requires fetching everything from its subnode.

David

[1] https://www.postgresql.org/docs/16/routine-vacuuming.html#VACUUM-FOR-STATISTICS

In response to

Responses

Browse pgsql-general by date

  From Date Subject
Next Message David Rowley 2024-05-11 01:37:10 Re: SELECT DISTINCT chooses parallel seqscan instead of indexscan on huge table with 1000 partitions
Previous Message Tom Lane 2024-05-11 01:33:38 Re: SELECT DISTINCT chooses parallel seqscan instead of indexscan on huge table with 1000 partitions