Re: Inval reliability, especially for inplace updates

From: Noah Misch <noah(at)leadboat(dot)com>
To: pgsql-hackers(at)postgresql(dot)org
Subject: Re: Inval reliability, especially for inplace updates
Date: 2024-06-15 22:37:18
Message-ID: 20240615223718.42.nmisch@google.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Wed, May 22, 2024 at 05:05:48PM -0700, Noah Misch wrote:
> https://postgr.es/m/20240512232923.aa.nmisch@google.com wrote:
> > Separable, nontrivial things not fixed in the attached patch stack:
> >
> > - Inplace update uses transactional CacheInvalidateHeapTuple(). ROLLBACK of
> > CREATE INDEX wrongly discards the inval, leading to the relhasindex=t loss
> > still seen in inplace-inval.spec. CacheInvalidateRelmap() does this right.
>
> I plan to fix that like CacheInvalidateRelmap(): send the inval immediately,
> inside the critical section. Send it in heap_xlog_inplace(), too.

> a. Within logical decoding, cease processing invalidations for inplace

I'm attaching the implementation. This applies atop the v3 patch stack from
https://postgr.es/m/20240614003549.c2.nmisch@google.com, but the threads are
mostly orthogonal and intended for independent review. Translating a tuple
into inval messages uses more infrastructure than relmapper, which needs just
a database ID. Hence, this ended up more like a miniature of inval.c's
participation in the transaction commit sequence.

I waffled on whether to back-patch inplace150-inval-durability-atcommit. The
consequences of that bug are plenty bad, but reaching them requires an error
between TransactionIdCommitTree() and AtEOXact_Inval(). I've not heard
reports of that, and I don't have a recipe for making it happen on demand.
For now, I'm leaning toward back-patch. The main risk would be me overlooking
an LWLock deadlock scenario reachable from the new, earlier RelCacheInitLock
timing. Alternatives for RelCacheInitLock:

- RelCacheInitLock before PreCommit_Notify(), because notify concurrency
matters more than init file concurrency. I chose this.
- RelCacheInitLock after PreCommit_Notify(), because PreCommit_Notify() uses a
heavyweight lock, giving it less risk of undetected deadlock.
- Replace RelCacheInitLock with a heavyweight lock, and keep it before
PreCommit_Notify().
- Fold PreCommit_Inval() back into AtCommit_Inval(), accepting that EIO in
unlink_initfile() will PANIC.

Opinions on that?

The patch changes xl_heap_inplace of XLOG_HEAP_INPLACE. For back branches, we
could choose between:

- Same change, no WAL version bump. Standby must update before primary. This
is best long-term, but the transition is more disruptive. I'm leaning
toward this one, but the second option isn't bad:

- heap_xlog_inplace() could set the shared-inval-queue overflow signal on
every backend. This is more wasteful, but inplace updates might be rare
enough (~once per VACUUM) to make it tolerable.

- Use LogStandbyInvalidations() just after XLOG_HEAP_INPLACE. This isn't
correct if one ends recovery between the two records, but you'd need to be
unlucky to notice. Noticing would need a procedure like the following. A
hot standby backend populates a relcache entry, then does DDL on the rel
after recovery ends.

Future cleanup work could eliminate LogStandbyInvalidations() and the case of
!markXidCommitted && nmsgs != 0. Currently, the src/test/regress suite still
reaches that case:

- AlterDomainDropConstraint() queues an inval even if !found; it can stop
that.

- ON COMMIT DELETE ROWS nontransactionally rebuilds an index, which sends a
relcache inval. The point of that inval is, I think, to force access
methods like btree and hash to reload the metapage copy that they store in
rd_amcache. Since no assigned XID implies no changes to the temp index, the
no-XID case could simply skip the index rebuild. (temp.sql reaches this
with a read-only transaction that selects from an ON COMMIT DELETE ROWS
table. Realistic usage will tend not to do that.) ON COMMIT DELETE ROWS
has another preexisting problem for indexes, mentioned in a code comment.

Thanks,
nm

Attachment Content-Type Size
inplace130-AtEOXact_RelationCache-comments-v1.patch text/plain 5.8 KB
inplace140-heapam_xlog-comment-v1.patch text/plain 951 bytes
inplace150-inval-durability-atcommit-v1.patch text/plain 13.6 KB
inplace160-inval-durability-inplace-v1.patch text/plain 38.1 KB

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Gayatri Singh 2024-06-15 23:09:05 Backup and Restore of Partitioned Table in PG-15
Previous Message Melanie Plageman 2024-06-15 22:00:43 Re: RFC: adding pytest as a supported test framework