From: | Andres Freund <andres(at)anarazel(dot)de> |
---|---|
To: | Alvaro Herrera <alvherre(at)2ndquadrant(dot)com>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, pgsql-hackers(at)postgresql(dot)org |
Cc: | Maxim Boguk <maxim(dot)boguk(at)gmail(dot)com>, Peter Geoghegan <pg(at)bowt(dot)ie>, Jeremy Finzel <finzelj(at)gmail(dot)com> |
Subject: | Re: found xmin from before relfrozenxid on pg_catalog.pg_authid |
Date: | 2018-05-25 20:37:36 |
Message-ID: | 20180525203736.crkbg36muzxrjj5e@alap3.anarazel.de |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-general pgsql-hackers |
Hi,
Moving discussion to -hackers. Tom, I think you worked most with this
code, your input would be appreciated.
Original discussion is around:
http://archives.postgresql.org/message-id/20180524211311.tnswfnjwnii54htx%40alvherre.pgsql
On 2018-05-24 17:13:11 -0400, Alvaro Herrera wrote:
> On 2018-May-24, Andres Freund wrote:
> > Then there's also:
> > http://archives.postgresql.org/message-id/1527193504642.36340%40amazon.com
>
> ah, so deleting the relcache file makes the problem to go away? That's
> definitely pretty strange. I see no reason for the value in relcache to
> become out of step with the catalogued value in the same database ... I
> don't think we transmit in any way values of one database to another.
I can reproduce the issue. As far as I can tell we just don't ever
actually update nailed relcache entries in the normal course, leaving
the "physical address" aside. VACUUM will, via
vac_update_relstats() -> heap_inplace_update() -> CacheInvalidateHeapTuple(),
send out an invalidation. But invalidation, in my case another session,
will essentially ignore most of that due to:
static void
RelationClearRelation(Relation relation, bool rebuild)
...
/*
* Never, never ever blow away a nailed-in system relation, because we'd
* be unable to recover. However, we must redo RelationInitPhysicalAddr
* in case it is a mapped relation whose mapping changed.
*
* If it's a nailed-but-not-mapped index, then we need to re-read the
* pg_class row to see if its relfilenode changed. We do that immediately
* if we're inside a valid transaction and the relation is open (not
* counting the nailed refcount). Otherwise just mark the entry as
* possibly invalid, and it'll be fixed when next opened.
*/
if (relation->rd_isnailed)
{
RelationInitPhysicalAddr(relation);
if (relation->rd_rel->relkind == RELKIND_INDEX ||
relation->rd_rel->relkind == RELKIND_PARTITIONED_INDEX)
{
relation->rd_isvalid = false; /* needs to be revalidated */
if (relation->rd_refcnt > 1 && IsTransactionState())
RelationReloadIndexInfo(relation);
}
return;
}
Which basically means that once running we'll never update the relcache
data for nailed entries. That's unproblematic for most relcache fields,
but not for things like RelationData->rd_rel->relfrozenxid / relminmxid.
This'll e.g. lead to lazy_vacuum_rel() wrongly not using aggressive
vacuums despite being required. And it'll lead, triggering this thread,
to wrong errors being raised during vacuum because relfrozenxid just is
some random value from the past. I suspect this might also be
co-responsible for a bunch of planning issues for queries involving the
catalog, because the planner will use wrong relcache data until the next
time the init file is thrown away?
This looks like a very longstanding bug to me. I'm not yet quite sure
what the best way to deal with this is. I suspect we might get away
with just looking up a new version of the pg_class tuple and copying
rd_rel over?
Greetings,
Andres Freund
From | Date | Subject | |
---|---|---|---|
Next Message | Andres Freund | 2018-05-25 20:38:17 | Re: found xmin from before relfrozenxid on pg_catalog.pg_authid |
Previous Message | Erlend Sogge Heggen | 2018-05-25 20:03:27 | Syndicating PostgreSQL mailing list to Discourse |
From | Date | Subject | |
---|---|---|---|
Next Message | Andres Freund | 2018-05-25 20:38:17 | Re: found xmin from before relfrozenxid on pg_catalog.pg_authid |
Previous Message | Chapman Flack | 2018-05-25 20:32:45 | Re: SPI/backend equivalent of extended-query Describe(statement)? |