From: | Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> |
---|---|
To: | Corey Huinker <corey(dot)huinker(at)gmail(dot)com> |
Cc: | Jeff Davis <pgsql(at)j-davis(dot)com>, Andres Freund <andres(at)anarazel(dot)de>, Michael Paquier <michael(at)paquier(dot)xyz>, jian he <jian(dot)universality(at)gmail(dot)com>, Nathan Bossart <nathandbossart(at)gmail(dot)com>, Bruce Momjian <bruce(at)momjian(dot)us>, Matthias van de Meent <boekewurm+postgres(at)gmail(dot)com>, Magnus Hagander <magnus(at)hagander(dot)net>, Stephen Frost <sfrost(at)snowman(dot)net>, Ashutosh Bapat <ashutosh(dot)bapat(dot)oss(at)gmail(dot)com>, Peter Smith <smithpb2250(at)gmail(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>, alvherre(at)alvh(dot)no-ip(dot)org |
Subject: | Re: Statistics Import and Export |
Date: | 2025-02-26 04:36:09 |
Message-ID: | 1704472.1740544569@sss.pgh.pa.us |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
Corey Huinker <corey(dot)huinker(at)gmail(dot)com> writes:
> My solution so far is to take allo the v11+ (SELECT array_agg...) functions
> and put them into a LATERAL, two of them filtered by attstattarget > 0 and
> a new one aggregating attnames with no filter.
> An alternative would be a new subselect for array_agg(attname) WHERE
> in.indexprs IS NOT NULL, thus removing the extra compute for the indexes
> that lack an index expression (i.e. most of them), and thus lack settable
> stats (at least for now) and wouldn't be affected by the name-jitter issue
> anyway.
Yeah, I've been thinking about that. I think that the idea of the
current design is that relatively few indexes will have explicit stats
targets set on them, so most of the time the sub-SELECTs produce no
data. (Which is not to say that they're cheap to execute.) If we
pull all the column names for all indexes then we'll likely bloat
pg_dump's working storage quite a bit. Pulling them only for indexes
with expression columns should fix that, and as you say we don't need
the names otherwise.
I still fear that those sub-selects are pretty expensive in aggregate
-- they are basically forcing a nestloop join -- and maybe we need to
rethink that whole idea.
BTW, just as a point of order: it is not the case that non-expression
indexes are free of name-jitter problems. That's because we don't
bother to rename index columns when the underlying table column is
renamed, thus:
regression=# create table t1 (id int primary key);
CREATE TABLE
regression=# \d t1_pkey
Index "public.t1_pkey"
Column | Type | Key? | Definition
--------+---------+------+------------
id | integer | yes | id
primary key, btree, for table "public.t1"
regression=# alter table t1 rename column id to xx;
ALTER TABLE
regression=# \d t1_pkey
Index "public.t1_pkey"
Column | Type | Key? | Definition
--------+---------+------+------------
id | integer | yes | xx
primary key, btree, for table "public.t1"
After dump-n-reload, this index's column will be named "xx".
That's not relevant to our current problem as long as we
don't store stats on such index columns, but it's plenty
relevant to the ALTER INDEX ... SET STATISTICS code.
> I'm on the fence about how to handle pg_clear_attribute_stats(), leaning
> toward overloaded functions.
I kinda felt that we didn't need to bother with an attnum-based
variant of pg_clear_attribute_stats(), since pg_dump has no
use for that. I won't stand in the way if you're desperate to
do it, though.
regards, tom lane
From | Date | Subject | |
---|---|---|---|
Next Message | Fujii Masao | 2025-02-26 04:46:19 | Re: Log connection establishment timings |
Previous Message | Sagar Shedge | 2025-02-26 04:09:13 | Re: Extend postgres_fdw_get_connections to return remote backend pid |