From: | Corey Huinker <corey(dot)huinker(at)gmail(dot)com> |
---|---|
To: | Stephen Frost <sfrost(at)snowman(dot)net> |
Cc: | Matthias van de Meent <boekewurm+postgres(at)gmail(dot)com>, Ashutosh Bapat <ashutosh(dot)bapat(dot)oss(at)gmail(dot)com>, Peter Smith <smithpb2250(at)gmail(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>, Tomas Vondra <tomas(dot)vondra(at)enterprisedb(dot)com> |
Subject: | Re: Statistics Import and Export |
Date: | 2024-03-11 18:20:36 |
Message-ID: | CADkLM=d3XhRSySPX+CHL7KmBCgt7ZpqL-0NLxdSWNYHx=MBjag@mail.gmail.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
>
>
>
>
> Having thought about it a bit more, I generally like the idea of being
> able to just update one stat instead of having to update all of them at
> once (and therefore having to go look up what the other values currently
> are...). That said, per below, perhaps making it strict is the better
> plan.
>
v8 has it as strict.
>
> > > Also, in some cases we allow the function to be called with a
> > > NULL but then make it a no-op rather than throwing an ERROR (eg, if the
> > > OID ends up being NULL).
> >
> > Thoughts on it emitting a WARN or NOTICE before returning false?
>
> Eh, I don't think so?
>
> Where this is coming from is that we can often end up with functions
> like these being called inside of larger queries, and having them spit
> out WARN or NOTICE will just make them noisy.
>
> That leads to my general feeling of just returning NULL if called with a
> NULL OID, as we would get with setting the function strict.
>
In which case we're failing nearly silently, yes, there is a null returned,
but we have no idea why there is a null returned. If I were using this
function manually I'd want to know what I did wrong, what parameter I
skipped, etc.
> Well, that code is for pg_statistic while I was looking at pg_class (in
> vacuum.c:1428-1443, where we track if we're actually changing anything
> and only make the pg_class change if there's actually something
> different):
>
I can do that, especially since it's only 3 tuples of known types, but my
reservations are summed up in the next comment.
> Not sure why we don't treat both the same way though ... although it's
> probably the case that it's much less likely to have an entire
> pg_statistic row be identical than the few values in pg_class.
>
That would also involve comparing ANYARRAY values, yuk. Also, a matched
record will never be the case when used in primary purpose of the function
(upgrades), and not a big deal in the other future cases (if we use it in
ANALYZE on foreign tables instead of remote table samples, users
experimenting with tuning queries under hypothetical workloads).
> Hmm, that's a valid point, so a NULL passed in would need to set that
> value actually to NULL, presumably. Perhaps then we should have
> pg_set_relation_stats() be strict and have pg_set_attribute_stats()
> handles NULLs passed in appropriately, and return NULL if the relation
> itself or attname, or other required (not NULL'able) argument passed in
> cause the function to return NULL.
>
That's how I have relstats done in v8, and could make it do that for attr
stats.
(What I'm trying to drive at here is a consistent interface for these
> functions, but one which does a no-op instead of returning an ERROR on
> values being passed in which aren't allowable; it can be quite
> frustrating trying to get a query to work where one of the functions
> decides to return ERROR instead of just ignoring things passed in which
> aren't valid.)
>
I like the symmetry of a consistent interface, but we've already got an
asymmetry in that the pg_class update is done non-transactionally (like
ANALYZE does).
One persistent problem is that there is no _safe equivalent to ARRAY_IN, so
that can always fail on us, though it should only do so if the string
passed in wasn't a valid array input format, or the values in the array
can't coerce to the attribute's basetype.
I should also point out that we've lost the ability to check if the export
values were of a type, and if the destination column is also of that type.
That's a non-issue in binary upgrades, but of course if a field changed
from integers to text the histograms would now be highly misleading.
Thoughts on adding a typname parameter that the function uses as a cheap
validity check?
v8 attached, incorporating these suggestions plus those of Bharath and
Bertrand. Still no pg_dump.
As for pg_dump, I'm currently leading toward the TOC entry having either a
series of commands:
SELECT pg_set_relation_stats('foo.bar'::regclass, ...);
pg_set_attribute_stats('foo.bar'::regclass, 'id'::name, ...); ...
Or one compound command
SELECT pg_set_relation_stats(t.oid, ...)
pg_set_attribute_stats(t.oid, 'id'::name, ...),
pg_set_attribute_stats(t.oid, 'last_name'::name, ...),
...
FROM (VALUES('foo.bar'::regclass)) AS t(oid);
The second one has the feature that if any one attribute fails, then the
whole update fails, except, of course, for the in-place update of pg_class.
This avoids having an explicit transaction block, but we could get that
back by having restore wrap the list of commands in a transaction block
(and adding the explicit lock commands) when it is safe to do so.
Attachment | Content-Type | Size |
---|---|---|
v8-0001-Create-pg_set_relation_stats-pg_set_attribute_sta.patch | text/x-patch | 41.8 KB |
From | Date | Subject | |
---|---|---|---|
Next Message | Andrey M. Borodin | 2024-03-11 18:27:43 | Re: UUID v7 |
Previous Message | Michał Kłeczek | 2024-03-11 17:58:54 | Alternative SAOP support for GiST |