From: | Konstantin Knizhnik <k(dot)knizhnik(at)postgrespro(dot)ru> |
---|---|
To: | PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org> |
Subject: | cannot freeze committed xmax |
Date: | 2020-10-28 13:44:12 |
Message-ID: | f4aa20ba-7793-31b9-28ac-e34205362535@postgrespro.ru |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
Hi hackers,
The following error was encountered by our customers:
Them have very huge catalog (size of pg_class relation is more than
30Gb) blowned by temporary relations.
When them try to vacuum it, the following error is reported:
vacuum full analyze pg_catalog.pg_class;
ERROR: cannot freeze committed xmax 596099954
The following records are present in pg_class:
(standard input)-10436009-<Data> ------
(standard input)-10436010- Item 1 -- Length: 229 Offset: 7936
(0x1f00) Flags: NORMAL
(standard input):10436011: XMIN: 596098791 XMAX: 596099954 CID|XVAC:
1 OID: 930322390
(standard input)-10436012- Block Id: 108700 linp Index: 17 Attributes:
33 Size: 32
(standard input)-10436013- infomask: 0x290b
(HASNULL|HASVARWIDTH|HASOID|XMIN_COMMITTED|XMAX_INVALID|UPDATED)
(standard input)-10436014- t_bits: [0]: 0xff [1]: 0xff [2]: 0xff [3]: 0x7f
(standard input)-10436015- [4]: 0x00
(standard input)-10436016-
(standard input)-10436017- Item 2 -- Length: 184 Offset: 7752
(0x1e48) Flags: NORMAL
(standard input):10436018: XMIN: 596098791 XMAX: 596099954 CID|XVAC:
2 OID: 930322393
(standard input)-10436019- Block Id: 108700 linp Index: 18 Attributes:
33 Size: 32
(standard input)-10436020- infomask: 0x2909
(HASNULL|HASOID|XMIN_COMMITTED|XMAX_INVALID|UPDATED)
(standard input)-10436021- t_bits: [0]: 0xff [1]: 0xff [2]: 0xff [3]: 0x3f
(standard input)-10436022- [4]: 0x00
(standard input)-10436023-
(standard input)-10436024- Item 3 -- Length: 184 Offset: 7568
(0x1d90) Flags: NORMAL
(standard input):10436025: XMIN: 596098791 XMAX: 596099954 CID|XVAC:
3 OID: 930322395
(standard input)-10436026- Block Id: 108700 linp Index: 19 Attributes:
33 Size: 32
(standard input)-10436027- infomask: 0x2909
(HASNULL|HASOID|XMIN_COMMITTED|XMAX_INVALID|UPDATED)
(standard input)-10436028- t_bits: [0]: 0xff [1]: 0xff [2]: 0xff [3]: 0x3f
(standard input)-10436029- [4]: 0x00
This error is reported in heap_prepare_freeze_tuple:
/*
* Process xmax. To thoroughly examine the current Xmax value we
need to
* resolve a MultiXactId to its member Xids, in case some of them are
* below the given cutoff for Xids. In that case, those values
might need
* freezing, too. Also, if a multi needs freezing, we cannot
simply take
* it out --- if there's a live updater Xid, it needs to be kept.
*
* Make sure to keep heap_tuple_needs_freeze in sync with this.
*/
xid = HeapTupleGetRawXmax(htup);
if (tuple->t_infomask & HEAP_XMAX_IS_MULTI)
{
...
}
else if (TransactionIdIsNormal(xid))
{
...
if (TransactionIdPrecedes(xid, cutoff_xid))
{
/*
* If we freeze xmax, make absolutely sure that it's not an XID
* that is important. (Note, a lock-only xmax can be removed
* independent of committedness, since a committed lock
holder has
* released the lock).
*/
if (!HEAP_XMAX_IS_LOCKED_ONLY(tuple->t_infomask) &&
TransactionIdDidCommit(xid))
ereport(ERROR,
(errcode(ERRCODE_DATA_CORRUPTED),
errmsg_internal("cannot freeze committed xmax
" XID_FMT,
xid)));
freeze_xmax = true;
}
else
freeze_xmax = false;
...
}
else if ((tuple->t_infomask & HEAP_XMAX_INVALID) ||
!TransactionIdIsValid(HeapTupleGetRawXmax(htup)))
{
freeze_xmax = false;
xmax_already_frozen = true;
}
So, as you can see, in all this records HEAP_XMAX_INVALID is set, but
xmax is normal transaction id.
This is why we produce error before check for HEAP_XMAX_INVALID in the
subsequent if.
I do not know value of cutoff_xid, because do not have access to the
debugger at customer site.
I will be please or any help how to localize the source of the problem.
Looks like there is no assumption that xmax should be set to
InvalidTransactionId when HEAP_XMAX_INVALID bit is set.
And I didn't find any check preventing cutoff_xid to be greater than
XID of some transaction which was aborted long time ago.
So is there some logical error that xmax is compared with cutoff_xid
before HEAP_XMAX_INVALID bit is checked?
Otherwise, where this constraint most likely be violated?
It is PG 11.7 version of Postgres.
Thanks is advance,
--
Konstantin Knizhnik
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company
From | Date | Subject | |
---|---|---|---|
Next Message | Bharath Rupireddy | 2020-10-28 13:49:55 | Re: A new function to wait for the backend exit after termination |
Previous Message | John Naylor | 2020-10-28 13:27:43 | Re: cutting down the TODO list thread |