Re: Statistics Import and Export

From: Corey Huinker <corey(dot)huinker(at)gmail(dot)com>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Jeff Davis <pgsql(at)j-davis(dot)com>, Andres Freund <andres(at)anarazel(dot)de>, Michael Paquier <michael(at)paquier(dot)xyz>, jian he <jian(dot)universality(at)gmail(dot)com>, Nathan Bossart <nathandbossart(at)gmail(dot)com>, Bruce Momjian <bruce(at)momjian(dot)us>, Matthias van de Meent <boekewurm+postgres(at)gmail(dot)com>, Magnus Hagander <magnus(at)hagander(dot)net>, Stephen Frost <sfrost(at)snowman(dot)net>, Ashutosh Bapat <ashutosh(dot)bapat(dot)oss(at)gmail(dot)com>, Peter Smith <smithpb2250(at)gmail(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>, alvherre(at)alvh(dot)no-ip(dot)org
Subject: Re: Statistics Import and Export
Date: 2025-02-26 04:57:40
Message-ID: CADkLM=cCJFH7DE1W+nT4uAxgEL-gOBoKMQORJTnzmoXTFw231w@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Tue, Feb 25, 2025 at 11:36 PM Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:

> Corey Huinker <corey(dot)huinker(at)gmail(dot)com> writes:
> > My solution so far is to take allo the v11+ (SELECT array_agg...)
> functions
> > and put them into a LATERAL, two of them filtered by attstattarget > 0
> and
> > a new one aggregating attnames with no filter.
>
> > An alternative would be a new subselect for array_agg(attname) WHERE
> > in.indexprs IS NOT NULL, thus removing the extra compute for the indexes
> > that lack an index expression (i.e. most of them), and thus lack settable
> > stats (at least for now) and wouldn't be affected by the name-jitter
> issue
> > anyway.
>
> Yeah, I've been thinking about that. I think that the idea of the
> current design is that relatively few indexes will have explicit stats
> targets set on them, so most of the time the sub-SELECTs produce no
> data. (Which is not to say that they're cheap to execute.) If we
> pull all the column names for all indexes then we'll likely bloat
> pg_dump's working storage quite a bit. Pulling them only for indexes
> with expression columns should fix that, and as you say we don't need
> the names otherwise.
>
> I still fear that those sub-selects are pretty expensive in aggregate
> -- they are basically forcing a nestloop join -- and maybe we need to
> rethink that whole idea.
>
> BTW, just as a point of order: it is not the case that non-expression
> indexes are free of name-jitter problems. That's because we don't
> bother to rename index columns when the underlying table column is
> renamed, thus:
>

Ouch.

> After dump-n-reload, this index's column will be named "xx".
> That's not relevant to our current problem as long as we
> don't store stats on such index columns, but it's plenty
> relevant to the ALTER INDEX ... SET STATISTICS code.
>

The only way I can imagine those columns getting their own stats is if we
start adding stats for columns of partial indexes, in which case we'd just
bump the predicate to WHERE (i.indexprs IS NOT NULL OR i.indpred IS NOT
NULL)

Just to confirm, we ARE able to assume dense packing of attributes in an
index, and thus we can infer the attnum from the position of the attname in
the aggregated array, and there's no need to do a parallel array_agg of
attnums, yes?

>
> > I'm on the fence about how to handle pg_clear_attribute_stats(), leaning
> > toward overloaded functions.
>
> I kinda felt that we didn't need to bother with an attnum-based
> variant of pg_clear_attribute_stats(), since pg_dump has no
> use for that. I won't stand in the way if you're desperate to
> do it, though.
>

I'm not desperate to slow this thread down, no. We'll stick with
attname-only.

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Tom Lane 2025-02-26 05:05:44 Re: Statistics Import and Export
Previous Message Michael Paquier 2025-02-26 04:56:13 Re: Spinlock can be released twice in procsignal.c