From: | Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com> |
---|---|
To: | Alvaro Herrera <alvherre(at)alvh(dot)no-ip(dot)org> |
Cc: | a(dot)kozhemyakin(at)postgrespro(dot)ru, pgsql-bugs(at)lists(dot)postgresql(dot)org |
Subject: | Re: BUG #17741: vacuum process hangs after pg_surgery manipulations |
Date: | 2023-01-18 01:40:39 |
Message-ID: | CAD21AoBYvTfc9E+3p6ecN2n=UsftggWaQiZo1xtYnObQ-uTiQQ@mail.gmail.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-bugs |
On Tue, Jan 17, 2023 at 12:37 AM Alvaro Herrera <alvherre(at)alvh(dot)no-ip(dot)org> wrote:
>
> On 2023-Jan-09, PG Bug reporting form wrote:
>
> > On the REL_15_STABLE, you can hang vacuum freeze. Maybe this is not
> > desired?
> > https://www.postgresql.org/docs/current/pgsurgery.html
> >
> > reproduce script:
> > create extension pg_surgery;
>
> Using pg_surgery is the equivalent of introducing corruption in your
> data. It has, of course, completely valid uses, but if you break the
> system while using it, it's on you to fix it.
>
> The pg_surgery documentation you cite states:
>
> : These functions are unsafe by design and using them may corrupt (or
> : further corrupt) your database.
>
> So, you've been warned.
While this is completely true and I agree, can we improve this
situation somewhat so that it ends up with an error instead of getting
hanged?
In this case, the tuple with a = 1, the root of the HOT chain, was
killed, and the tuple with a = 2 was heap-only tuple and HOT-updated.
In heap_page_prune(), we normally can prune the tuple with a = 2 as
part of pruning its chain, but since the root tuple was already killed
we could not prune this tuple. Then, we ended up retrying
heap_page_prune() since we saw as if the tuple became dead since
heap_page_prune() looked. Normally retrying heap_page_prune() works
but in this case since we didn't have the root tuple it misses again,
and gets hanged after all. I think that we didn't have this hang
before 8523492d4e3 even in the same corruption case. One idea is to
improve this situation is that we have a sanity check that we have
retired due to the same tuple.
Regards,
--
Masahiko Sawada
Amazon Web Services: https://aws.amazon.com
From | Date | Subject | |
---|---|---|---|
Next Message | Sam.Mesh | 2023-01-18 01:43:37 | index not used for bigint without explicit cast |
Previous Message | Andres Freund | 2023-01-17 20:04:32 | Re: DROP DATABASE deadlocks with logical replication worker in PG 15.1 |