pgsql: Fix logical decoding error when system table w/ toast is repeate

From: Andres Freund <andres(at)anarazel(dot)de>
To: pgsql-committers(at)lists(dot)postgresql(dot)org
Subject: pgsql: Fix logical decoding error when system table w/ toast is repeate
Date: 2018-10-10 20:56:28
Message-ID: E1gALWu-0002D7-VW@gemulon.postgresql.org
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-committers

Fix logical decoding error when system table w/ toast is repeatedly rewritten.

Repeatedly rewriting a mapped catalog table with VACUUM FULL or
CLUSTER could cause logical decoding to fail with:
ERROR, "could not map filenode \"%s\" to relation OID"

To trigger the problem the rewritten catalog had to have live tuples
with toasted columns.

The problem was triggered as during catalog table rewrites the
heap_insert() check that prevents logical decoding information to be
emitted for system catalogs, failed to treat the new heap's toast table
as a system catalog (because the new heap is not recognized as a
catalog table via RelationIsLogicallyLogged()). The relmapper, in
contrast to the normal catalog contents, does not contain historical
information. After a single rewrite of a mapped table the new relation
is known to the relmapper, but if the table is rewritten twice before
logical decoding occurs, the relfilenode cannot be mapped to a
relation anymore. Which then leads us to error out. This only
happens for toast tables, because the main table contents aren't
re-inserted with heap_insert().

The fix is simple, add a new heap_insert() flag that prevents logical
decoding information from being emitted, and accept during decoding
that there might not be tuple data for toast tables.

Unfortunately that does not fix pre-existing logical decoding
errors. Doing so would require not throwing an error when a filenode
cannot be mapped to a relation during decoding, and that seems too
likely to hide bugs. If it's crucial to fix decoding for an existing
slot, temporarily changing the ERROR in ReorderBufferCommit() to a
WARNING appears to be the best fix.

Author: Andres Freund
Discussion: https://postgr.es/m/20180914021046.oi7dm4ra3ot2g2kt@alap3.anarazel.de
Backpatch: 9.4-, where logical decoding was introduced

Branch
------
REL_11_STABLE

Details
-------
https://git.postgresql.org/pg/commitdiff/88670a4366110c946ef47048d1cebd641209fb0d

Modified Files
--------------
contrib/test_decoding/expected/rewrite.out | 75 +++++++++++++++++++++++++
contrib/test_decoding/sql/rewrite.sql | 42 +++++++++++++-
src/backend/access/heap/heapam.c | 11 +++-
src/backend/access/heap/rewriteheap.c | 19 ++++++-
src/backend/replication/logical/reorderbuffer.c | 25 +++++++--
src/include/access/heapam.h | 1 +
6 files changed, 163 insertions(+), 10 deletions(-)

Browse pgsql-committers by date

  From Date Subject
Next Message Andres Freund 2018-10-10 20:56:30 pgsql: Fix logical decoding error when system table w/ toast is repeate
Previous Message Andres Freund 2018-10-10 20:56:27 pgsql: Force synchronous commit to be enabled for all test_decoding tes