Re: ANALYZE ONLY

From: Melih Mutlu <m(dot)melihmutlu(at)gmail(dot)com>
To: Michael Harris <harmic(at)gmail(dot)com>
Cc: PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>, Jelte Fennema <postgres(at)jeltef(dot)nl>, David Rowley <dgrowleyml(at)gmail(dot)com>
Subject: Re: ANALYZE ONLY
Date: 2024-08-20 16:26:48
Message-ID: CAGPVpCQpVqkFq1ja6EDU6ZLyJxXSRk2yGbdV0-6GGs87Y_3qtA@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Hi Michael,

Thanks for starting this thread. I've also spent a bit time on this after
reading your first thread on this issue [1]

Michael Harris <harmic(at)gmail(dot)com>, 20 Ağu 2024 Sal, 08:52 tarihinde şunu
yazdı:

> The problem is that giving an ANALYZE command targeting a partitioned table
> causes it to update statistics for the partitioned table AND all the
> individual
> partitions. There is currently no option to prevent it from including the
> partitions.
>
> This is wasteful for our application: for one thing the autovacuum
> has already analyzed the individual partitions; for another most of
> the partitions
> will have had no changes, so they don't need to be analyzed repeatedly.
>

I agree that it's a waste to analyze partitions when they're already
analyzed by autovacuum. It would be nice to have a way to run analyze only
on a partitioned table without its partitions.

> I took some measurements when running ANALYZE on one of our tables. It
> took approx
> 4 minutes to analyze the partitioned table, then 29 minutes to analyze the
> partitions. We have hundreds of these tables, so the cost is very
> significant.
>

I quickly tweaked the code a bit to exclude partitions when a partitioned
table is being analyzed. I can confirm that there is a significant gain
even on a simple case like a partitioned table with 10 partitions and 1M
rows in each partition.

1. Would such a feature be welcomed? Are there any traps I might not
> have thought of?
>
> 2. The existing ANALYZE command has the following structure:
>
> ANALYZE [ ( option [, ...] ) ] [ table_and_columns [, ...] ]
>
> It would be easiest to add ONLY as another option, but that
> doesn't look quite
> right to me - surely the ONLY should be attached to the table name
> An alternative would be:
>
> ANALYZE [ ( option [, ...] ) ] [ONLY] [ table_and_columns [, ...] ]
>

I feel closer to adding this as an option instead of a new keyword in
ANALYZE grammar. To me, it would be easier to have this option and then
give the names of partitioned tables as opposed to typing ONLY before each
partition table.
But we should think of another name as ONLY is used differently (attached
to the table name as you mentioned) in other contexts.

I've been also thinking about how this new option should affect inheritance
tables. Should it have just no impact on them or only analyze the parent
table without taking child tables into account? There are two records for
an inheritance parent table in pg_statistic, one row for only the parent
table and a second row including children. We might only analyze the parent
table if this new "ONLY" option is specified. I'm not sure if that would be
something users would need or not, but I think this option should behave
similarly for both partitioned tables and inheritance tables.

If we decide to go with only partition tables and not care about
inheritance, then naming this option to SKIP_PARTITIONS as Jelte suggested
sounds fine. But that name wouldn't work if this option will affect
inheritance tables.

Thanks,
--
Melih Mutlu
Microsoft

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Bertrand Drouvot 2024-08-20 16:28:34 Re: define PG_REPLSLOT_DIR
Previous Message Bertrand Drouvot 2024-08-20 16:26:23 Re: define PG_REPLSLOT_DIR