From: | Matthias van de Meent <boekewurm+postgres(at)gmail(dot)com> |
---|---|
To: | Robert Haas <robertmhaas(at)gmail(dot)com> |
Cc: | Laurenz Albe <laurenz(dot)albe(at)cybertec(dot)at>, Thom Brown <thom(at)linux(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org> |
Subject: | Re: Disabling Heap-Only Tuples |
Date: | 2023-08-30 16:11:05 |
Message-ID: | CAEze2WgFC3+kXLPyKEgmBoS3OemfkshNHPc7VKatWnU9h5T5Dw@mail.gmail.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
On Wed, 30 Aug 2023 at 15:31, Robert Haas <robertmhaas(at)gmail(dot)com> wrote:
>
> On Wed, Aug 30, 2023 at 9:01 AM Matthias van de Meent
> <boekewurm+postgres(at)gmail(dot)com> wrote:
> > I've reworked the patch a bit to remove the "excessive bloat with low
> > fillfactors when local space is available" issue that this parameter
> > could cause - local updates are now done if the selected page we would
> > be inserting into is after the old tuple's page and the old tuple's
> > page still (or: now) has space available.
> >
> > Does that alleviate your concerns?
>
> That seems like a good chance, but my core concern is around people
> having to micromanage local_update_limit, and probably either not
> knowing how to do it properly, or not being able or willing to keep
> updating it as things change.
Assuming you do want to provide a way to users to solve the issue of
"there is a lot of free space in the table, but I don't want to take
an access exclusive lock or wait for new inserts to fix the issue",
how would you suggest we do that then?
Alternative approaches that I can think of are:
- A %-based parameter.
This does scale with the table, but doesn't stop being a performance
hog once you've reached the optimal table size, and thus also needs to
be disabled.
- Measure the parameter from the end of the table, instead of from the
front; i.e. "try to empty the last X=50 MBs of the table".
Scales with the table, but same issue as above - once the table has
an optimal size, it doesn't stop.
- Install one more dynamic system to move the tuples to a better page,
one the users don't directly control (yet to be designed).
I don't know if or when this will be implemented and what benefits
it will have, but we don't have access to a lot of state in
table_tuple_update or heap_update, so any data needs special lookup.
- Let users keep using VACUUM FULL and CLUSTER instead.
I don't think this is a reasonable solution.
> In a way, this parameter is a lot like work_mem, which is notoriously
> very difficult to tune. If you set it too high, you run out of memory.
> If you set it too low, you get bad plans. You can switch from having
> one of those problems to having the other very quickly as load changs,
> and sometimes you can have both at the same time. If an omniscient
> oracle could set work_mem properly for every query based not only on
> what the query does but the state of the system at that moment, it
> would still be a very crude parameter, and since omniscient oracles
> are rare in practice, problems are reasonably common. I think that if
> we add this parameter, it's going to end up in the same category. A
> lot of people will ignore it, and they'll be OK, but 30% of the people
> who do try to use it will shoot themselves in the foot, or something
> like that.
The "shoot yourself in the foot" in this case is limited to "your
UPDATE statement's performance is potentially Y times worse due to
forced FSM lookups for every update at the end of the table". I'll
admit that this is not great, but I'd say it is also not the end of
the world, and still much better than the performance differences that
you can see when the plan changes due to an updated work_mem.
I'd love to have more contextual information available on the table's
free space distribution so that this decision could be made by the
system, but that info just isn't available right now. We don't really
have infrastructure in place that would handle such information
either, and table_tuple_update does not get to use reuse state across
tuples, so any use of information will add cost for every update. With
this patch, the FSM cost is gated behind the storage parameter, and
thus only limited, but I don't think we can store much more than
storage parameters in the Relation data.
VACUUM / ANALYZE could probably create and store sketches about the
free space distribution in the relation, but that would widen the
scope significantly, and I have only limited bandwidth available for
this.
So, while I do plan to implement any small changes or fixes required
to get this in, a major change in direction for this patch won't put
it anywhere high on my active items list.
Kind regards,
Matthias van de Meent
From | Date | Subject | |
---|---|---|---|
Next Message | Alvaro Herrera | 2023-08-30 16:28:36 | Re: More new SQL/JSON item methods |
Previous Message | Jeff Davis | 2023-08-30 16:09:43 | Re: [17] CREATE SUBSCRIPTION ... SERVER |