From: | Andres Freund <andres(at)anarazel(dot)de> |
---|---|
To: | pgsql-hackers(at)postgresql(dot)org, Tomas Vondra <tv(at)fuzzy(dot)cz> |
Subject: | Re: logical decoding bug when mapped relation with toast contents is rewritten repeatedly |
Date: | 2018-10-10 20:57:19 |
Message-ID: | 20181010205719.hs3cvfakipmfmkvv@alap3.anarazel.de |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
Hi,
On 2018-09-13 19:10:46 -0700, Andres Freund wrote:
> (Tomas, CCing you because you IIRC mentioned encountered an issue like
> this)
>
> I just spent quite a while debugging an issue where running logical
> decoding yielded a:
> ERROR: could not map filenode "base/$X/$Y" to relation OID
> error.
>
> After discarding like 30 different theories, I have found the cause:
>
> During rewrites (i.e. VACUUM FULL / CLUSTER) of a mapped relation with a
> toast table with actual live toasted tuples (pg_proc in my case and
> henceforth) heap inserts with the toast data happen into the new toast
> relation, triggered by:
> At that point the new toast relation does *NOT* appear to be a system
>OA catalog, it's just appears as an "independent" table. Therefore we do
> not trigger, in heap_insert():
>
> /*
> * RelationIsLogicallyLogged
> * True if we need to log enough information to extract the data from the
> * WAL stream.
> *
> * We don't log information for unlogged tables (since they don't WAL log
> * anyway) and for system tables (their content is hard to make sense of, and
> * it would complicate decoding slightly for little gain). Note that we *do*
> * log information for user defined catalog tables since they presumably are
> * interesting to the user...
> */
> #define RelationIsLogicallyLogged(relation) \
> (XLogLogicalInfoActive() && \
> RelationNeedsWAL(relation) && \
> !IsCatalogRelation(relation))
>
> /*
> * For logical decoding, we need the tuple even if we're doing a full
> * page write, so make sure it's included even if we take a full-page
> * image. (XXX We could alternatively store a pointer into the FPW).
> */
> if (RelationIsLogicallyLogged(relation))
> {
> xlrec.flags |= XLH_INSERT_CONTAINS_NEW_TUPLE;
> bufflags |= REGBUF_KEEP_DATA;
> }
>
> i.e. the inserted toast tuple will be marked as
> XLH_INSERT_CONTAINS_NEW_TUPLE - which it shouldn't, because it's a
> system table. Which we currently do not allow do be logically decoded.
>
> That normally ends up being harmless, because ReorderBufferCommit() has the
> following check:
> if (!RelationIsLogicallyLogged(relation))
> goto change_done;
>
> but to reach that check, we first have to map the relfilenode from the
> WAL to the corresponding OID:
> reloid = RelidByRelfilenode(change->data.tp.relnode.spcNode,
> change->data.tp.relnode.relNode);
>
> That works correctly if there's only one rewrite - the relmapper
> contains the data for the new toast table. But if there's been *two*
> consecutive rewrites, the relmapper *does not* contain the intermediary
> relfilenode of pg_proc. There's no such problem for non-mapped tables,
> because historic snapshots allow us to access the relevant data, but the
> relmapper isn't mvcc.
>
> Therefore the catalog-rewrite escape hatch of:
> /*
> * Catalog tuple without data, emitted while catalog was
> * in the process of being rewritten.
> */
> if (reloid == InvalidOid &&
> change->data.tp.newtuple == NULL &&
> change->data.tp.oldtuple == NULL)
> goto change_done;
> does not trigger and we run into:
> else if (reloid == InvalidOid)
> elog(ERROR, "could not map filenode \"%s\" to relation OID",
> relpathperm(change->data.tp.relnode,
> MAIN_FORKNUM));
>
>
> commenting out this error / converting it into a warning makes this case
> harmless, but could obviously be problematic in other scenarios.
>
>
> I suspect the proper fix would be to have a new HEAP_INSERT_NO_LOGICAL
> option, and specify that in raw_heap_insert() iff
> RelationIsLogicallyLogged(state->rs_old_rel) or something like that.
>
> Attached is a *prototype* patch of that approach. Without the code
> level changes the addition to test_decoding's rewrite.sql trigger the
> bug, after it they're fixed.
>
>
> The only reason the scenario I was debugging hit this was that there was
> a cluster wide VACUUM FULL a couple times a day, and replication was
> several hours behind due to slow network / receiving side.
I've pushed this now. I added more tests,which found an issue around
with the change around rewrites of non-mapped catalog tables.
Greetings,
Andres Freund
From | Date | Subject | |
---|---|---|---|
Next Message | Peter Eisentraut | 2018-10-10 21:09:23 | Re: automatic restore point |
Previous Message | Andres Freund | 2018-10-10 19:55:33 | Re: Postgres 11 release notes |