From: | Peter Geoghegan <pg(at)bowt(dot)ie> |
---|---|
To: | Robert Haas <robertmhaas(at)gmail(dot)com> |
Cc: | PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org> |
Subject: | Re: Fixing a couple of buglets in how VACUUM sets visibility map bits |
Date: | 2023-01-11 00:39:30 |
Message-ID: | CAH2-WzmisLS0Gxj=+NJLAxqC9bZmUV4W3CdD0dxnisf7WeXPcQ@mail.gmail.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
On Tue, Jan 10, 2023 at 12:08 PM Peter Geoghegan <pg(at)bowt(dot)ie> wrote:
> Actually, FreezeMultiXactId() can fully remove an xmax that has some
> member XIDs >= OldestXmin, provided FRM_NOOP processing isn't
> possible, at least when no individual member is still running. Doesn't
> have to involve transaction aborts at all.
>
> Let me go try to break it that way...
Attached patch shows how this could break.
It adds an assertion that checks that the expected
PD_ALL_VISIBLE/VM_ALL_VISIBLE() conditions hold at the right point. It
also comments out FreezeMultiXactId()'s FRM_NOOP handling.
The FRM_NOOP case is really just an optimization, and shouldn't be
needed for correctness. This is amply demonstrated by running "meston
test" with the patch applied, which will pass without incident.
I can get the PD_ALL_VISIBLE assertion to fail by following this
procedure with the patch applied:
* Run a plain VACUUM to set all the pages from a table all-visible,
but not all-frozen.
* Set a breakpoint that will hit after all_visible_according_to_vm is
set to true, for an interesting blkno.
* Run VACUUM FREEZE. We need FREEZE in order to be able to hit the
relevant visibilitymap_set() call site (the one that passes
VISIBILITYMAP_ALL_FROZEN as its flags, without also passing
VISIBILITYMAP_ALL_VISIBLE).
Now all_visible_according_to_vm is set to true, but we don't have a
lock/pin on the same heap page just yet.
* Acquire several non-conflicting row locks on a row on the block in
question, so that a new multi is allocated.
* End every session whose XID is stored in our multi (commit/abort).
* Within GDB, continue from before -- observe assertion failure.
Obviously this scenario doesn't demonstrate the presence of a bug --
not quite. But it does prove that we rely on FRM_NOOP to not allow the
VM to become corrupt, which just doesn't make any sense, and can't
have been intended. At a minimum, it strongly suggests that the
current approach is very fragile.
--
Peter Geoghegan
Attachment | Content-Type | Size |
---|---|---|
0001-Debug-freeze-map-issue.patch | application/octet-stream | 1.5 KB |
From | Date | Subject | |
---|---|---|---|
Next Message | David Rowley | 2023-01-11 00:48:16 | Re: Todo: Teach planner to evaluate multiple windows in the optimal order |
Previous Message | Thomas Munro | 2023-01-11 00:12:46 | Re: Use windows VMs instead of windows containers on the CI |