ANALYZE ONLY

From: Michael Harris <harmic(at)gmail(dot)com>
To: PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: ANALYZE ONLY
Date: 2024-08-20 05:52:12
Message-ID: CADofcAWATx_haD=QkSxHbnTsAe6+e0Aw8Eh4H8cXyogGvn_kOg@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Hello Postgres Hackers

An application that I am developing uses Postgresql, and includes a fairly large
number of partitioned tables which are used to store time series data.

The tables are partitioned by time, and typically there is only one partition
at a time - the current partition - that is actually being updated.
Older partitions
are available for query and eventually dropped.

As per the documentation, partitioned tables are not analyzed by the autovacuum
workers, although their partitions are. Statistics are needed on the partitioned
table level for at least some query planning activities.

The problem is that giving an ANALYZE command targeting a partitioned table
causes it to update statistics for the partitioned table AND all the individual
partitions. There is currently no option to prevent it from including the
partitions.

This is wasteful for our application: for one thing the autovacuum
has already analyzed the individual partitions; for another most of
the partitions
will have had no changes, so they don't need to be analyzed repeatedly.

I took some measurements when running ANALYZE on one of our tables. It
took approx
4 minutes to analyze the partitioned table, then 29 minutes to analyze the
partitions. We have hundreds of these tables, so the cost is very significant.

For my use case at least it would be fantastic if we could add an ONLY option
to ANALYZE, which would cause it to analyze the named table only and not
descend into the partitions.

I took a look at the source and it looks doable, but before launching into it
I thought I would ask a few questions here.

1. Would such a feature be welcomed? Are there any traps I might not
have thought of?

2. The existing ANALYZE command has the following structure:

ANALYZE [ ( option [, ...] ) ] [ table_and_columns [, ...] ]

It would be easiest to add ONLY as another option, but that
doesn't look quite
right to me - surely the ONLY should be attached to the table name?
An alternative would be:

ANALYZE [ ( option [, ...] ) ] [ONLY] [ table_and_columns [, ...] ]

Any feedback or advice would be great.

Regards
Mike.

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Amit Kapila 2024-08-20 06:06:45 Re: [bug fix] prepared transaction might be lost when max_prepared_transactions is zero on the subscriber
Previous Message Ashutosh Bapat 2024-08-20 05:40:46 Re: define PG_REPLSLOT_DIR