Re: Partitioned table statistics vs autoanalyze

From: Justin Pryzby <pryzby(at)telsasoft(dot)com>
To: Kamil Frydel <k(dot)frydel(at)gmail(dot)com>
Cc: pgsql-performance(at)lists(dot)postgresql(dot)org
Subject: Re: Partitioned table statistics vs autoanalyze
Date: 2021-07-22 12:15:59
Message-ID: 20210722121559.GE19620@telsasoft.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-performance

On Thu, Jul 22, 2021 at 01:32:51PM +0200, Kamil Frydel wrote:
> table_1 and table_2 are hash partitioned using volume_id column. Usually we
> make analyze on partitions. We do not make analyze on the partitioned table
> (parent).
> However, if we run 'analyze' on the partitioned table then planner starts
> choosing hash join. As a comparison, the execution using nested loop takes
> about 15 minutes and if it is done using hash join then the query lasts for
> about 1 minute. When running 'analyze' for the partitioned table, postgres
> inserts statistics for the partitioned table into pg_stats (pg_statistics).
> Before that, there are only statistics for partitions. We suspect that this
> is the reason for selecting bad query plan.

> updated, the inheritance statistics will not be up to date unless you run
> ANALYZE manually.
> (https://www.postgresql.org/docs/13/sql-analyze.html)
>
> I would appreciate if anyone could shed some light on the following
> questions:
> 1) Is this above paragraph from docs still valid in PG 13 and does it apply
> to declarative partitioning as well? Is running analyze manually on a
> partitioned table needed to get proper plans for queries on partitioned
> tables? Partitioned table (in the declarative way) is ”virtual” and does not
> keep any data so it seems that there are no statistics that can be gathered
> from the table itself and statistics from partitions should be sufficient.

Up through v13, autoanalyze doesn't collect stats on parent tables (neither
declarative nor inheritence). I agree that this doesn't seem to be well
documented. I think it should also be mentioned here:
https://www.postgresql.org/docs/current/routine-vacuuming.html#VACUUM-FOR-STATISTICS

In v14 (which is currently in beta), autoanalyze will process the partitioned
table automatically:
https://www.postgresql.org/docs/14/release-14.html
|Autovacuum now analyzes partitioned tables (Yuzuko Hosoya, Álvaro Herrera)
|Insert, update, and delete tuple counts from partitions are now propagated to their parent tables so autovacuum knows when to process them.

> 2) Why does the planner need these statistics since they seem to be unused
> in the query plan. The query plan uses only partitions, not the partitioned
> table.

The "inherited" stats are used when you SELECT FROM table. The stats for the
individual table would be needed when you SELECT FROM ONLY table (which makes
no sense for a partitioned table).

--
Justin

In response to

Responses

Browse pgsql-performance by date

  From Date Subject
Next Message ldh@laurent-hasson.com 2021-07-22 13:37:18 RE: Big performance slowdown from 11.2 to 13.3
Previous Message Kamil Frydel 2021-07-22 11:32:51 Partitioned table statistics vs autoanalyze