From: | Matthias van de Meent <boekewurm+postgres(at)gmail(dot)com> |
---|---|
To: | Peter Geoghegan <pg(at)bowt(dot)ie> |
Cc: | Alexander Lakhin <exclusion(at)gmail(dot)com>, PostgreSQL mailing lists <pgsql-bugs(at)lists(dot)postgresql(dot)org>, Andres Freund <andres(at)anarazel(dot)de>, Matthias van de Meent <boekewurm+postgres(at)gmail(dot)com> |
Subject: | Re: BUG #17257: (auto)vacuum hangs within lazy_scan_prune() |
Date: | 2021-11-01 15:15:27 |
Message-ID: | CAEze2Wj7O5tnM_U151Baxr5ObTJafwH=71_JEmgJV+6eBgjL7g@mail.gmail.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-bugs |
On Fri, 29 Oct 2021 at 20:17, Peter Geoghegan <pg(at)bowt(dot)ie> wrote:
>
> On Fri, Oct 29, 2021 at 6:30 AM Alexander Lakhin <exclusion(at)gmail(dot)com> wrote:
> > I can propose the debugging patch to reproduce the issue that replaces
> > the hang with the assert and modifies a pair of crash-causing test
> > scripts to simplify the reproducing. (Sorry, I have no time now to prune
> > down the scripts further as I have to leave for a week.)
>
> This bug is similar to the one fixed in commit d9d8aa9b. And so I
> wonder if code like GlobalVisTestFor() is missing something that it
> needs for partitioned tables.
Without `autovacuum = off; fsync = off` I could not replicate the
issue in the configured 10m time window; with those options I did get
the reported trace in minutes.
I think that I also have found the culprit, which is something we
talked about in [0]: GlobalVisState->maybe_needed was not guaranteed
to never move backwards when recalculated, and because vacuum can
update its snapshot bounds (heap_prune_satisfies_vacuum ->
GlobalVisTestIsRemovableFullXid -> GlobalVisUpdate) this maybe_needed
could move backwards, resulting in the observed behaviour.
It was my understanding based on the mail conversation that Andres
would fix this observed issue too while fixing [0] (whose fix was
included with beta 2), but apparently I was wrong; I can't find the
code for 'maybe_needed'-won't-move-backwards-in-a-backend.
I (again) propose the attached patch, which ensures that this
maybe_needed field will not move backwards for a backend. It is
based on 14, but should be applied on head as well, because it's
lacking there as well.
Another alternative would be to replace the use of vacrel->OldestXmin
with `vacrel->vistest->maybe_needed` in lazy_scan_prune, but I believe
that is not legal in how vacuum works (we cannot unilaterally decide
that we want to retain tuples < OldestXmin).
Note: After fixing the issue with retreating maybe_needed I also hit
your segfault, and I'm still trying to find out what the source of
that issue might be. I do think it is an issue seperate from stuck
vacuum, though.
Kind regards,
Matthias van de Meent
Attachment | Content-Type | Size |
---|---|---|
0001-Fix-stuck-vacuum-due-to-retreating-GlobalVisState-ma.patch | application/octet-stream | 2.0 KB |
From | Date | Subject | |
---|---|---|---|
Next Message | PG Bug reporting form | 2021-11-01 21:09:00 | BUG #17261: FK ON UPDATE CASCADE can break referential integrity with columns of different types |
Previous Message | Robert Haas | 2021-11-01 14:17:46 | Re: CREATE INDEX CONCURRENTLY does not index prepared xact's data |