Author: Noah Misch Commit: Noah Misch Fix data loss at inplace update after heap_update(). As previously-added tests demonstrated, heap_inplace_update() could instead update an unrelated tuple of the same catalog. It could lose the update. Losing relhasindex=t was a source of index corruption. Inplace-updating commands like VACUUM will now wait for heap_update() commands like GRANT TABLE and GRANT DATABASE. That isn't ideal, but a long-running GRANT already hurts VACUUM progress more just by keeping an XID running. The VACUUM will behave like a DELETE or UPDATE waiting for the uncommitted change. For implementation details, start at the heap_inplace_update_scan() header comment and README.tuplock. Back-patch to v12 (all supported versions). In back branches, retain a deprecated heap_inplace_update(), for extensions, and add LockOrStrongerHeldByMe() instead of adding a LockHeldByMe() parameter. Reviewed by FIXME. Discussion: https://postgr.es/m/FIXME diff --git a/src/backend/access/heap/README.tuplock b/src/backend/access/heap/README.tuplock index 6441e8b..2a5b114 100644 --- a/src/backend/access/heap/README.tuplock +++ b/src/backend/access/heap/README.tuplock @@ -153,3 +153,56 @@ The following infomask bits are applicable: We currently never set the HEAP_XMAX_COMMITTED when the HEAP_XMAX_IS_MULTI bit is set. + +Locking to write inplace-updated tables +--------------------------------------- + +[This is the plan, but LOCKTAG_TUPLE acquisition is not yet here.] + +If IsInplaceUpdateRelation() returns true for a table, the table is a system +catalog that receives heap_inplace_update_scan() calls. Preparing a +heap_update() of these tables follows additional locking rules, to ensure we +don't lose the effects of an inplace update. In particular, consider a moment +when a backend has fetched the old tuple to modify, not yet having called +heap_update(). Another backend's inplace update starting then can't conclude +until the heap_update() places its new tuple in a buffer. We enforce that +using locktags as follows. While DDL code is the main audience, the executor +follows these rules to make e.g. "MERGE INTO pg_class" safer. Locking rules +are per-catalog: + + pg_class heap_inplace_update_scan() callers: before the call, acquire + LOCKTAG_RELATION in mode ShareLock (CREATE INDEX), ShareUpdateExclusiveLock + (VACUUM), or a mode with strictly more conflicts. If the update targets an + index's pg_class row, that lock must be on the table. Locking the index rel + is optional. (This allows VACUUM to overwrite per-index pg_class while + holding a lock on the table alone.) We could allow weaker locks, in which + case the next paragraph would simply call for stronger locks for its class + of commands. heap_inplace_update_scan() acquires and releases LOCKTAG_TUPLE + in InplaceUpdateTupleLock, an alias for ExclusiveLock, on each tuple it + overwrites. + + pg_class heap_update() callers: before copying the tuple to modify, take a + lock that conflicts with at least one of those from the preceding paragraph. + SearchSysCacheLocked1() is one convenient way to acquire LOCKTAG_TUPLE. + After heap_update(), release any LOCKTAG_TUPLE. Most of these callers opt + to acquire just the LOCKTAG_RELATION. + + pg_database: before copying the tuple to modify, all updaters of pg_database + rows acquire LOCKTAG_TUPLE. (Few updaters acquire LOCKTAG_OBJECT on the + database OID, so it wasn't worth extending that as a second option.) + +Ideally, DDL might want to perform permissions checks before LockTuple(), as +we do with RangeVarGetRelidExtended() callbacks. We typically don't bother. +LOCKTAG_TUPLE acquirers release it after each row, so the potential +inconvenience is lower. + +Reading inplace-updated columns +------------------------------- + +Inplace updates create an exception to the rule that tuple data won't change +under a reader holding a pin. A reader of a heap_fetch() result tuple may +witness a torn read. Current inplace-updated fields are aligned and are no +wider than four bytes, and current readers don't need consistency across +fields. Hence, they get by with just fetching each field once. XXX such a +caller may also read a value that has not reached WAL; see +heap_inplace_update_finish(). diff --git a/src/backend/access/heap/heapam.c b/src/backend/access/heap/heapam.c index 4eda445..b116fe1 100644 --- a/src/backend/access/heap/heapam.c +++ b/src/backend/access/heap/heapam.c @@ -76,6 +76,9 @@ static XLogRecPtr log_heap_update(Relation reln, Buffer oldbuf, Buffer newbuf, HeapTuple oldtup, HeapTuple newtup, HeapTuple old_key_tuple, bool all_visible_cleared, bool new_all_visible_cleared); +#ifdef USE_ASSERT_CHECKING +static void check_inplace_rel_lock(HeapTuple oldtup); +#endif static Bitmapset *HeapDetermineColumnsInfo(Relation relation, Bitmapset *interesting_cols, Bitmapset *external_cols, @@ -97,6 +100,7 @@ static void compute_new_xmax_infomask(TransactionId xmax, uint16 old_infomask, static TM_Result heap_lock_updated_tuple(Relation rel, HeapTuple tuple, ItemPointer ctid, TransactionId xid, LockTupleMode mode); +static bool inplace_xmax_lock(SysScanDesc scan); static void GetMultiXactIdHintBits(MultiXactId multi, uint16 *new_infomask, uint16 *new_infomask2); static TransactionId MultiXactIdGetUpdateXid(TransactionId xmax, @@ -4069,6 +4073,45 @@ l2: return TM_Ok; } +#ifdef USE_ASSERT_CHECKING +/* + * Confirm adequate relation lock held, per rules from README.tuplock section + * "Locking to write inplace-updated tables". + */ +static void +check_inplace_rel_lock(HeapTuple oldtup) +{ + Form_pg_class classForm = (Form_pg_class) GETSTRUCT(oldtup); + Oid relid = classForm->oid; + Oid dbid; + LOCKTAG tag; + + if (IsSharedRelation(relid)) + dbid = InvalidOid; + else + dbid = MyDatabaseId; + + if (classForm->relkind == RELKIND_INDEX) + { + Relation irel = index_open(relid, AccessShareLock); + + SET_LOCKTAG_RELATION(tag, dbid, irel->rd_index->indrelid); + index_close(irel, AccessShareLock); + } + else + SET_LOCKTAG_RELATION(tag, dbid, relid); + + if (!LockHeldByMe(&tag, ShareUpdateExclusiveLock, true)) + elog(WARNING, + "missing lock on relation \"%s\" (OID %u, relkind %c) @ TID (%u,%u)", + NameStr(classForm->relname), + relid, + classForm->relkind, + ItemPointerGetBlockNumber(&oldtup->t_self), + ItemPointerGetOffsetNumber(&oldtup->t_self)); +} +#endif + /* * Check if the specified attribute's values are the same. Subroutine for * HeapDetermineColumnsInfo. @@ -6038,34 +6081,45 @@ heap_abort_speculative(Relation relation, ItemPointer tid) } /* - * heap_inplace_update - update a tuple "in place" (ie, overwrite it) + * heap_inplace_update_scan - update a row "in place" (ie, overwrite it) * - * Overwriting violates both MVCC and transactional safety, so the uses - * of this function in Postgres are extremely limited. Nonetheless we - * find some places to use it. + * Overwriting violates both MVCC and transactional safety, so the uses of + * this function in Postgres are extremely limited. Nonetheless we find some + * places to use it. See README.tuplock section "Locking to write + * inplace-updated tables" and later sections for expectations of readers and + * writers of a table that gets inplace updates. Standard flow: * - * The tuple cannot change size, and therefore it's reasonable to assume - * that its null bitmap (if any) doesn't change either. So we just - * overwrite the data portion of the tuple without touching the null - * bitmap or any of the header fields. + * ... [any slow preparation not requiring oldtup] ... + * heap_inplace_update_scan([...], &tup, &inplace_state); + * if (!HeapTupleIsValid(tup)) + * elog(ERROR, [...]); + * ... [buffer is exclusive-locked; mutate "tup"] ... + * if (dirty) + * heap_inplace_update_finish(inplace_state, tup); + * else + * heap_inplace_update_cancel(inplace_state); * - * tuple is an in-memory tuple structure containing the data to be written - * over the target tuple. Also, tuple->t_self identifies the target tuple. + * Since this is intended for system catalogs and SERIALIZABLE doesn't cover + * DDL, this skips some predicate locks. * - * Note that the tuple updated here had better not come directly from the - * syscache if the relation has a toast relation as this tuple could - * include toast values that have been expanded, causing a failure here. + * The first several params duplicate the systable_beginscan() param list. + * "oldtupcopy" is an output parameter, assigned NULL if the key ceases to + * find a live tuple. (In PROC_IN_VACUUM, that is a low-probability transient + * condition.) If "oldtupcopy" gets non-NULL, you must pass output parameter + * "state" to heap_inplace_update_finish() or heap_inplace_update_cancel(). */ void -heap_inplace_update(Relation relation, HeapTuple tuple) +heap_inplace_update_scan(Relation relation, + Oid indexId, + bool indexOK, + Snapshot snapshot, + int nkeys, const ScanKeyData *key, + HeapTuple *oldtupcopy, void **state) { - Buffer buffer; - Page page; - OffsetNumber offnum; - ItemId lp = NULL; - HeapTupleHeader htup; - uint32 oldlen; - uint32 newlen; + ScanKey mutable_key = palloc(sizeof(ScanKeyData) * nkeys); + int retries = 0; + SysScanDesc scan; + HeapTuple oldtup; /* * For now, we don't allow parallel updates. Unlike a regular update, @@ -6078,21 +6132,70 @@ heap_inplace_update(Relation relation, HeapTuple tuple) (errcode(ERRCODE_INVALID_TRANSACTION_STATE), errmsg("cannot update tuples during a parallel operation"))); - INJECTION_POINT("inplace-before-pin"); - buffer = ReadBuffer(relation, ItemPointerGetBlockNumber(&(tuple->t_self))); - LockBuffer(buffer, BUFFER_LOCK_EXCLUSIVE); - page = (Page) BufferGetPage(buffer); + /* + * Accept a snapshot argument, for symmetry, but this function advances + * its snapshot as needed to reach the tail of the updated tuple chain. + */ + Assert(snapshot == NULL); - offnum = ItemPointerGetOffsetNumber(&(tuple->t_self)); - if (PageGetMaxOffsetNumber(page) >= offnum) - lp = PageGetItemId(page, offnum); + Assert(IsInplaceUpdateRelation(relation) || !IsSystemRelation(relation)); - if (PageGetMaxOffsetNumber(page) < offnum || !ItemIdIsNormal(lp)) - elog(ERROR, "invalid lp"); + /* Loop for an exclusive-locked buffer of a non-updated tuple. */ + do + { + CHECK_FOR_INTERRUPTS(); - htup = (HeapTupleHeader) PageGetItem(page, lp); + /* + * Processes issuing heap_update (e.g. GRANT) at maximum speed could + * drive us to this error. A hostile table owner has stronger ways to + * damage their own table, so that's minor. + */ + if (retries++ > 10000) + elog(ERROR, "giving up after too many tries to overwrite row"); - oldlen = ItemIdGetLength(lp) - htup->t_hoff; + memcpy(mutable_key, key, sizeof(ScanKeyData) * nkeys); + INJECTION_POINT("inplace-before-pin"); + scan = systable_beginscan(relation, indexId, indexOK, snapshot, + nkeys, mutable_key); + oldtup = systable_getnext(scan); + if (!HeapTupleIsValid(oldtup)) + { + systable_endscan(scan); + *oldtupcopy = NULL; + return; + } + +#ifdef USE_ASSERT_CHECKING + if (RelationGetRelid(relation) == RelationRelationId) + check_inplace_rel_lock(oldtup); +#endif + } while (!inplace_xmax_lock(scan)); + + *oldtupcopy = heap_copytuple(oldtup); + *state = scan; +} + +/* + * heap_inplace_update_finish - second phase of heap_inplace_update_scan() + * + * The tuple cannot change size, and therefore its header fields and null + * bitmap (if any) don't change either. + */ +void +heap_inplace_update_finish(void *state, HeapTuple tuple) +{ + SysScanDesc scan = (SysScanDesc) state; + TupleTableSlot *slot = scan->slot; + BufferHeapTupleTableSlot *bslot = (BufferHeapTupleTableSlot *) slot; + HeapTuple oldtup = bslot->base.tuple; + HeapTupleHeader htup = oldtup->t_data; + Buffer buffer = bslot->buffer; + Relation relation = scan->heap_rel; + uint32 oldlen; + uint32 newlen; + + Assert(ItemPointerEquals(&oldtup->t_self, &tuple->t_self)); + oldlen = oldtup->t_len - htup->t_hoff; newlen = tuple->t_len - tuple->t_data->t_hoff; if (oldlen != newlen || htup->t_hoff != tuple->t_data->t_hoff) elog(ERROR, "wrong tuple length"); @@ -6104,6 +6207,19 @@ heap_inplace_update(Relation relation, HeapTuple tuple) (char *) tuple->t_data + tuple->t_data->t_hoff, newlen); + /*---------- + * XXX A crash here can allow datfrozenxid() to get ahead of relfrozenxid: + * + * ["D" is a VACUUM (ONLY_DATABASE_STATS)] + * ["R" is a VACUUM tbl] + * D: vac_update_datfrozenid() -> systable_beginscan(pg_class) + * D: systable_getnext() returns pg_class tuple of tbl + * R: memcpy() into pg_class tuple of tbl + * D: raise pg_database.datfrozenxid, XLogInsert(), finish + * [crash] + * [recovery restores datfrozenxid w/o relfrozenxid] + */ + MarkBufferDirty(buffer); /* XLOG stuff */ @@ -6124,23 +6240,188 @@ heap_inplace_update(Relation relation, HeapTuple tuple) recptr = XLogInsert(RM_HEAP_ID, XLOG_HEAP_INPLACE); - PageSetLSN(page, recptr); + PageSetLSN(BufferGetPage(buffer), recptr); } END_CRIT_SECTION(); - UnlockReleaseBuffer(buffer); + LockBuffer(buffer, BUFFER_LOCK_UNLOCK); + systable_endscan(scan); /* * Send out shared cache inval if necessary. Note that because we only * pass the new version of the tuple, this mustn't be used for any * operations that could change catcache lookup keys. But we aren't * bothering with index updates either, so that's true a fortiori. + * + * XXX ROLLBACK discards the invalidation. See test inplace-inval.spec. */ if (!IsBootstrapProcessingMode()) CacheInvalidateHeapTuple(relation, tuple, NULL); } +/* + * heap_inplace_update_cancel - abandon a heap_inplace_update_scan() + * + * This is an alternative to making a no-op update. + */ +void +heap_inplace_update_cancel(void *state) +{ + SysScanDesc scan = (SysScanDesc) state; + TupleTableSlot *slot = scan->slot; + BufferHeapTupleTableSlot *bslot = (BufferHeapTupleTableSlot *) slot; + Buffer buffer = bslot->buffer; + + LockBuffer(buffer, BUFFER_LOCK_UNLOCK); + systable_endscan(scan); +} + +/* + * inplace_xmax_lock - protect inplace update from concurrent heap_update() + * + * This operates on the last tuple that systable_getnext() returned. Evaluate + * whether the tuple's state is compatible with a no-key update. Current + * transaction rowmarks are fine, as is KEY SHARE from any transaction. If + * compatible, return true with the buffer exclusive-locked. Otherwise, + * return false after blocking transactions, if any, have ended. + * + * One could modify this to return true for tuples with delete in progress, + * All inplace updaters take lock that conflicts with DROP. If it does happen + * somehow, we'll wait for it like we would an update. + * + * Readers of inplace-updated fields expect changes to those fields are + * durable. For example, vac_truncate_clog() reads datfrozenxid from + * pg_database tuples via catalog snapshots. A future snapshot must not + * return a lower datfrozenxid for the same database OID (lower in the + * FullTransactionIdPrecedes() sense). We achieve that since no update of a + * tuple can start while we hold a lock on its buffer. In cases like + * BEGIN;GRANT;CREATE INDEX;COMMIT we're inplace-updating a tuple visible only + * to this transaction. ROLLBACK then is one case where it's okay to lose + * inplace updates. (Restoring relhasindex=false on ROLLBACK is fine, since + * any concurrent CREATE INDEX would have blocked, then inplace-updated the + * committed tuple.) + * + * In principle, we could avoid waiting by overwriting every tuple in the + * updated tuple chain. Reader expectations permit updating a tuple only if + * it's aborted, is the tail of the chain, or we already updated the tuple + * referenced in its t_ctid. Hence, we would need to overwrite the tuples in + * order from tail to head. That would tolerate either (a) mutating all + * tuples in one critical section or (b) accepting a chance of partial + * completion. Partial completion of a relfrozenxid update would have the + * weird consequence that the table's next VACUUM could see the table's + * relfrozenxid move forward between vacuum_get_cutoffs() and finishing. + */ +static bool +inplace_xmax_lock(SysScanDesc scan) +{ + TupleTableSlot *slot = scan->slot; + BufferHeapTupleTableSlot *bslot = (BufferHeapTupleTableSlot *) slot; + HeapTupleData oldtup = *bslot->base.tuple; + Buffer buffer = bslot->buffer; + TM_Result result; + bool ret; + + Assert(TTS_IS_BUFFERTUPLE(slot)); + Assert(BufferIsValid(buffer)); + + LockBuffer(buffer, BUFFER_LOCK_EXCLUSIVE); + + /*---------- + * Interpret HeapTupleSatisfiesUpdate() like heap_update() does, except: + * + * - wait unconditionally + * - no tuple locks + * - don't recheck header after wait: simpler to defer to next iteration + * - don't try to continue even if the updater aborts: likewise + * - no crosscheck + */ + result = HeapTupleSatisfiesUpdate(&oldtup, GetCurrentCommandId(false), + buffer); + + if (result == TM_Invisible) + { + /* no known way this can happen */ + ereport(ERROR, + (errcode(ERRCODE_OBJECT_NOT_IN_PREREQUISITE_STATE), + errmsg_internal("attempted to overwrite invisible tuple"))); + } + else if (result == TM_SelfModified) + { + /* + * CREATE INDEX might reach this if an expression is silly enough to + * call e.g. SELECT ... FROM pg_class FOR SHARE. + */ + ereport(ERROR, + (errcode(ERRCODE_OBJECT_NOT_IN_PREREQUISITE_STATE), + errmsg("tuple to be updated was already modified by an operation triggered by the current command"))); + } + else if (result == TM_BeingModified) + { + TransactionId xwait; + uint16 infomask; + Relation relation; + + xwait = HeapTupleHeaderGetRawXmax(oldtup.t_data); + infomask = oldtup.t_data->t_infomask; + relation = scan->heap_rel; + + if (infomask & HEAP_XMAX_IS_MULTI) + { + LockTupleMode lockmode = LockTupleNoKeyExclusive; + MultiXactStatus mxact_status = MultiXactStatusNoKeyUpdate; + int remain; + bool current_is_member; + + if (DoesMultiXactIdConflict((MultiXactId) xwait, infomask, + lockmode, ¤t_is_member)) + { + LockBuffer(buffer, BUFFER_LOCK_UNLOCK); + systable_endscan(scan); + ret = false; + MultiXactIdWait((MultiXactId) xwait, mxact_status, infomask, + relation, &oldtup.t_self, XLTW_Update, + &remain); + } + else + ret = true; + } + else if (TransactionIdIsCurrentTransactionId(xwait)) + ret = true; + else if (HEAP_XMAX_IS_KEYSHR_LOCKED(infomask)) + ret = true; + else + { + LockBuffer(buffer, BUFFER_LOCK_UNLOCK); + systable_endscan(scan); + ret = false; + XactLockTableWait(xwait, relation, &oldtup.t_self, + XLTW_Update); + } + } + else + { + ret = (result == TM_Ok); + if (!ret) + { + LockBuffer(buffer, BUFFER_LOCK_UNLOCK); + systable_endscan(scan); + } + } + + /* + * GetCatalogSnapshot() relies on invalidation messages to know when to + * take a new snapshot. COMMIT of xwait is responsible for sending the + * invalidation. We're not acquiring heavyweight locks sufficient to + * block if not yet sent, so we must take a new snapshot to avoid spinning + * that ends with a "too many tries" error. While we don't need this if + * xwait aborted, don't bother optimizing that. + */ + if (!ret) + InvalidateCatalogSnapshot(); + return ret; +} + #define FRM_NOOP 0x0001 #define FRM_INVALIDATE_XMAX 0x0002 #define FRM_RETURN_IS_XID 0x0004 diff --git a/src/backend/catalog/index.c b/src/backend/catalog/index.c index 5c48e57..00b3e4f 100644 --- a/src/backend/catalog/index.c +++ b/src/backend/catalog/index.c @@ -2788,7 +2788,9 @@ index_update_stats(Relation rel, { Oid relid = RelationGetRelid(rel); Relation pg_class; + ScanKeyData key[1]; HeapTuple tuple; + void *state; Form_pg_class rd_rel; bool dirty; @@ -2822,33 +2824,12 @@ index_update_stats(Relation rel, pg_class = table_open(RelationRelationId, RowExclusiveLock); - /* - * Make a copy of the tuple to update. Normally we use the syscache, but - * we can't rely on that during bootstrap or while reindexing pg_class - * itself. - */ - if (IsBootstrapProcessingMode() || - ReindexIsProcessingHeap(RelationRelationId)) - { - /* don't assume syscache will work */ - TableScanDesc pg_class_scan; - ScanKeyData key[1]; - - ScanKeyInit(&key[0], - Anum_pg_class_oid, - BTEqualStrategyNumber, F_OIDEQ, - ObjectIdGetDatum(relid)); - - pg_class_scan = table_beginscan_catalog(pg_class, 1, key); - tuple = heap_getnext(pg_class_scan, ForwardScanDirection); - tuple = heap_copytuple(tuple); - table_endscan(pg_class_scan); - } - else - { - /* normal case, use syscache */ - tuple = SearchSysCacheCopy1(RELOID, ObjectIdGetDatum(relid)); - } + ScanKeyInit(&key[0], + Anum_pg_class_oid, + BTEqualStrategyNumber, F_OIDEQ, + ObjectIdGetDatum(relid)); + heap_inplace_update_scan(pg_class, ClassOidIndexId, true, NULL, 1, key, + &tuple, &state); if (!HeapTupleIsValid(tuple)) elog(ERROR, "could not find tuple for relation %u", relid); @@ -2911,11 +2892,12 @@ index_update_stats(Relation rel, */ if (dirty) { - heap_inplace_update(pg_class, tuple); + heap_inplace_update_finish(state, tuple); /* the above sends a cache inval message */ } else { + heap_inplace_update_cancel(state); /* no need to change tuple, but force relcache inval anyway */ CacheInvalidateRelcacheByTuple(tuple); } diff --git a/src/backend/catalog/toasting.c b/src/backend/catalog/toasting.c index 738bc46..c882f3c 100644 --- a/src/backend/catalog/toasting.c +++ b/src/backend/catalog/toasting.c @@ -29,6 +29,7 @@ #include "catalog/toasting.h" #include "miscadmin.h" #include "nodes/makefuncs.h" +#include "utils/fmgroids.h" #include "utils/rel.h" #include "utils/syscache.h" @@ -333,21 +334,36 @@ create_toast_table(Relation rel, Oid toastOid, Oid toastIndexOid, */ class_rel = table_open(RelationRelationId, RowExclusiveLock); - reltup = SearchSysCacheCopy1(RELOID, ObjectIdGetDatum(relOid)); - if (!HeapTupleIsValid(reltup)) - elog(ERROR, "cache lookup failed for relation %u", relOid); - - ((Form_pg_class) GETSTRUCT(reltup))->reltoastrelid = toast_relid; - if (!IsBootstrapProcessingMode()) { /* normal case, use a transactional update */ + reltup = SearchSysCacheCopy1(RELOID, ObjectIdGetDatum(relOid)); + if (!HeapTupleIsValid(reltup)) + elog(ERROR, "cache lookup failed for relation %u", relOid); + + ((Form_pg_class) GETSTRUCT(reltup))->reltoastrelid = toast_relid; + CatalogTupleUpdate(class_rel, &reltup->t_self, reltup); } else { /* While bootstrapping, we cannot UPDATE, so overwrite in-place */ - heap_inplace_update(class_rel, reltup); + + ScanKeyData key[1]; + void *state; + + ScanKeyInit(&key[0], + Anum_pg_class_oid, + BTEqualStrategyNumber, F_OIDEQ, + ObjectIdGetDatum(relOid)); + heap_inplace_update_scan(class_rel, ClassOidIndexId, true, + NULL, 1, key, &reltup, &state); + if (!HeapTupleIsValid(reltup)) + elog(ERROR, "cache lookup failed for relation %u", relOid); + + ((Form_pg_class) GETSTRUCT(reltup))->reltoastrelid = toast_relid; + + heap_inplace_update_finish(state, reltup); } heap_freetuple(reltup); diff --git a/src/backend/commands/dbcommands.c b/src/backend/commands/dbcommands.c index be629ea..da4d2b7 100644 --- a/src/backend/commands/dbcommands.c +++ b/src/backend/commands/dbcommands.c @@ -1637,6 +1637,8 @@ dropdb(const char *dbname, bool missing_ok, bool force) bool db_istemplate; Relation pgdbrel; HeapTuple tup; + ScanKeyData key[1]; + void *inplace_state; Form_pg_database datform; int notherbackends; int npreparedxacts; @@ -1774,11 +1776,6 @@ dropdb(const char *dbname, bool missing_ok, bool force) */ pgstat_drop_database(db_id); - tup = SearchSysCacheCopy1(DATABASEOID, ObjectIdGetDatum(db_id)); - if (!HeapTupleIsValid(tup)) - elog(ERROR, "cache lookup failed for database %u", db_id); - datform = (Form_pg_database) GETSTRUCT(tup); - /* * Except for the deletion of the catalog row, subsequent actions are not * transactional (consider DropDatabaseBuffers() discarding modified @@ -1790,8 +1787,17 @@ dropdb(const char *dbname, bool missing_ok, bool force) * modification is durable before performing irreversible filesystem * operations. */ + ScanKeyInit(&key[0], + Anum_pg_database_oid, + BTEqualStrategyNumber, F_OIDEQ, + ObjectIdGetDatum(db_id)); + heap_inplace_update_scan(pgdbrel, DatabaseOidIndexId, true, + NULL, 1, key, &tup, &inplace_state); + if (!HeapTupleIsValid(tup)) + elog(ERROR, "cache lookup failed for database %u", db_id); + datform = (Form_pg_database) GETSTRUCT(tup); datform->datconnlimit = DATCONNLIMIT_INVALID_DB; - heap_inplace_update(pgdbrel, tup); + heap_inplace_update_finish(inplace_state, tup); XLogFlush(XactLastRecEnd); /* @@ -1799,6 +1805,7 @@ dropdb(const char *dbname, bool missing_ok, bool force) * the row will be gone, but if we fail, dropdb() can be invoked again. */ CatalogTupleDelete(pgdbrel, &tup->t_self); + heap_freetuple(tup); /* * Drop db-specific replication slots. diff --git a/src/backend/commands/event_trigger.c b/src/backend/commands/event_trigger.c index 7a5ed6b..22d0ce7 100644 --- a/src/backend/commands/event_trigger.c +++ b/src/backend/commands/event_trigger.c @@ -946,25 +946,18 @@ EventTriggerOnLogin(void) { Relation pg_db = table_open(DatabaseRelationId, RowExclusiveLock); HeapTuple tuple; + void *state; Form_pg_database db; ScanKeyData key[1]; - SysScanDesc scan; - /* - * Get the pg_database tuple to scribble on. Note that this does - * not directly rely on the syscache to avoid issues with - * flattened toast values for the in-place update. - */ + /* Fetch a copy of the tuple to scribble on */ ScanKeyInit(&key[0], Anum_pg_database_oid, BTEqualStrategyNumber, F_OIDEQ, ObjectIdGetDatum(MyDatabaseId)); - scan = systable_beginscan(pg_db, DatabaseOidIndexId, true, - NULL, 1, key); - tuple = systable_getnext(scan); - tuple = heap_copytuple(tuple); - systable_endscan(scan); + heap_inplace_update_scan(pg_db, DatabaseOidIndexId, true, + NULL, 1, key, &tuple, &state); if (!HeapTupleIsValid(tuple)) elog(ERROR, "could not find tuple for database %u", MyDatabaseId); @@ -980,13 +973,15 @@ EventTriggerOnLogin(void) * that avoids possible waiting on the row-level lock. Second, * that avoids dealing with TOAST. * - * It's known that changes made by heap_inplace_update() may - * be lost due to concurrent normal updates. However, we are - * OK with that. The subsequent connections will still have a - * chance to set "dathasloginevt" to false. + * Changes made by inplace update may be lost due to + * concurrent normal updates; see inplace-inval.spec. However, + * we are OK with that. The subsequent connections will still + * have a chance to set "dathasloginevt" to false. */ - heap_inplace_update(pg_db, tuple); + heap_inplace_update_finish(state, tuple); } + else + heap_inplace_update_cancel(state); table_close(pg_db, RowExclusiveLock); heap_freetuple(tuple); } diff --git a/src/backend/commands/vacuum.c b/src/backend/commands/vacuum.c index 521ee74..64b9e9d 100644 --- a/src/backend/commands/vacuum.c +++ b/src/backend/commands/vacuum.c @@ -1405,7 +1405,9 @@ vac_update_relstats(Relation relation, { Oid relid = RelationGetRelid(relation); Relation rd; + ScanKeyData key[1]; HeapTuple ctup; + void *inplace_state; Form_pg_class pgcform; bool dirty, futurexid, @@ -1416,7 +1418,12 @@ vac_update_relstats(Relation relation, rd = table_open(RelationRelationId, RowExclusiveLock); /* Fetch a copy of the tuple to scribble on */ - ctup = SearchSysCacheCopy1(RELOID, ObjectIdGetDatum(relid)); + ScanKeyInit(&key[0], + Anum_pg_class_oid, + BTEqualStrategyNumber, F_OIDEQ, + ObjectIdGetDatum(relid)); + heap_inplace_update_scan(rd, ClassOidIndexId, true, + NULL, 1, key, &ctup, &inplace_state); if (!HeapTupleIsValid(ctup)) elog(ERROR, "pg_class entry for relid %u vanished during vacuuming", relid); @@ -1524,7 +1531,9 @@ vac_update_relstats(Relation relation, /* If anything changed, write out the tuple. */ if (dirty) - heap_inplace_update(rd, ctup); + heap_inplace_update_finish(inplace_state, ctup); + else + heap_inplace_update_cancel(inplace_state); table_close(rd, RowExclusiveLock); @@ -1576,6 +1585,7 @@ vac_update_datfrozenxid(void) bool bogus = false; bool dirty = false; ScanKeyData key[1]; + void *inplace_state; /* * Restrict this task to one backend per database. This avoids race @@ -1699,20 +1709,18 @@ vac_update_datfrozenxid(void) relation = table_open(DatabaseRelationId, RowExclusiveLock); /* - * Get the pg_database tuple to scribble on. Note that this does not - * directly rely on the syscache to avoid issues with flattened toast - * values for the in-place update. + * Fetch a copy of the tuple to scribble on. We could check the syscache + * tuple first. If that concluded !dirty, we'd avoid waiting on + * concurrent heap_update() and would avoid exclusive-locking the buffer. + * For now, don't optimize that. */ ScanKeyInit(&key[0], Anum_pg_database_oid, BTEqualStrategyNumber, F_OIDEQ, ObjectIdGetDatum(MyDatabaseId)); - scan = systable_beginscan(relation, DatabaseOidIndexId, true, - NULL, 1, key); - tuple = systable_getnext(scan); - tuple = heap_copytuple(tuple); - systable_endscan(scan); + heap_inplace_update_scan(relation, DatabaseOidIndexId, true, + NULL, 1, key, &tuple, &inplace_state); if (!HeapTupleIsValid(tuple)) elog(ERROR, "could not find tuple for database %u", MyDatabaseId); @@ -1746,7 +1754,9 @@ vac_update_datfrozenxid(void) newMinMulti = dbform->datminmxid; if (dirty) - heap_inplace_update(relation, tuple); + heap_inplace_update_finish(inplace_state, tuple); + else + heap_inplace_update_cancel(inplace_state); heap_freetuple(tuple); table_close(relation, RowExclusiveLock); diff --git a/src/backend/storage/lmgr/lmgr.c b/src/backend/storage/lmgr/lmgr.c index fe3cda2..55f5cce 100644 --- a/src/backend/storage/lmgr/lmgr.c +++ b/src/backend/storage/lmgr/lmgr.c @@ -335,32 +335,7 @@ CheckRelationLockedByMe(Relation relation, LOCKMODE lockmode, bool orstronger) relation->rd_lockInfo.lockRelId.dbId, relation->rd_lockInfo.lockRelId.relId); - if (LockHeldByMe(&tag, lockmode)) - return true; - - if (orstronger) - { - LOCKMODE slockmode; - - for (slockmode = lockmode + 1; - slockmode <= MaxLockMode; - slockmode++) - { - if (LockHeldByMe(&tag, slockmode)) - { -#ifdef NOT_USED - /* Sometimes this might be useful for debugging purposes */ - elog(WARNING, "lock mode %s substituted for %s on relation %s", - GetLockmodeName(tag.locktag_lockmethodid, slockmode), - GetLockmodeName(tag.locktag_lockmethodid, lockmode), - RelationGetRelationName(relation)); -#endif - return true; - } - } - } - - return false; + return LockHeldByMe(&tag, lockmode, orstronger); } /* diff --git a/src/backend/storage/lmgr/lock.c b/src/backend/storage/lmgr/lock.c index 7229f21..60b746d 100644 --- a/src/backend/storage/lmgr/lock.c +++ b/src/backend/storage/lmgr/lock.c @@ -578,11 +578,17 @@ DoLockModesConflict(LOCKMODE mode1, LOCKMODE mode2) } /* - * LockHeldByMe -- test whether lock 'locktag' is held with mode 'lockmode' - * by the current transaction + * LockHeldByMe -- test whether lock 'locktag' is held by the current + * transaction + * + * Returns true if current transaction holds a lock on 'tag' of mode + * 'lockmode'. If 'orstronger' is true, a stronger lockmode is also OK. + * ("Stronger" is defined as "numerically higher", which is a bit + * semantically dubious but is OK for the purposes we use this for.) */ bool -LockHeldByMe(const LOCKTAG *locktag, LOCKMODE lockmode) +LockHeldByMe(const LOCKTAG *locktag, + LOCKMODE lockmode, bool orstronger) { LOCALLOCKTAG localtag; LOCALLOCK *locallock; @@ -598,7 +604,23 @@ LockHeldByMe(const LOCKTAG *locktag, LOCKMODE lockmode) &localtag, HASH_FIND, NULL); - return (locallock && locallock->nLocks > 0); + if (locallock && locallock->nLocks > 0) + return true; + + if (orstronger) + { + LOCKMODE slockmode; + + for (slockmode = lockmode + 1; + slockmode <= MaxLockMode; + slockmode++) + { + if (LockHeldByMe(locktag, slockmode, false)) + return true; + } + } + + return false; } #ifdef USE_ASSERT_CHECKING diff --git a/src/include/access/heapam.h b/src/include/access/heapam.h index c47a504..33e7134 100644 --- a/src/include/access/heapam.h +++ b/src/include/access/heapam.h @@ -336,7 +336,14 @@ extern TM_Result heap_lock_tuple(Relation relation, HeapTuple tuple, bool follow_updates, Buffer *buffer, struct TM_FailureData *tmfd); -extern void heap_inplace_update(Relation relation, HeapTuple tuple); +extern void heap_inplace_update_scan(Relation relation, + Oid indexId, + bool indexOK, + Snapshot snapshot, + int nkeys, const ScanKeyData *key, + HeapTuple *oldtupcopy, void **state); +extern void heap_inplace_update_finish(void *state, HeapTuple tuple); +extern void heap_inplace_update_cancel(void *state); extern bool heap_prepare_freeze_tuple(HeapTupleHeader tuple, const struct VacuumCutoffs *cutoffs, HeapPageFreeze *pagefrz, diff --git a/src/include/storage/lock.h b/src/include/storage/lock.h index 0017d4b..cc1f6e7 100644 --- a/src/include/storage/lock.h +++ b/src/include/storage/lock.h @@ -567,7 +567,8 @@ extern void LockReleaseAll(LOCKMETHODID lockmethodid, bool allLocks); extern void LockReleaseSession(LOCKMETHODID lockmethodid); extern void LockReleaseCurrentOwner(LOCALLOCK **locallocks, int nlocks); extern void LockReassignCurrentOwner(LOCALLOCK **locallocks, int nlocks); -extern bool LockHeldByMe(const LOCKTAG *locktag, LOCKMODE lockmode); +extern bool LockHeldByMe(const LOCKTAG *locktag, + LOCKMODE lockmode, bool orstronger); #ifdef USE_ASSERT_CHECKING extern HTAB *GetLockMethodLocalHash(void); #endif diff --git a/src/test/isolation/expected/intra-grant-inplace-db.out b/src/test/isolation/expected/intra-grant-inplace-db.out index 432ece5..a91402c 100644 --- a/src/test/isolation/expected/intra-grant-inplace-db.out +++ b/src/test/isolation/expected/intra-grant-inplace-db.out @@ -9,20 +9,20 @@ step b1: BEGIN; step grant1: GRANT TEMP ON DATABASE isolation_regression TO regress_temp_grantee; -step vac2: VACUUM (FREEZE); +step vac2: VACUUM (FREEZE); step snap3: INSERT INTO frozen_witness SELECT datfrozenxid FROM pg_database WHERE datname = current_catalog; step c1: COMMIT; +step vac2: <... completed> step cmp3: SELECT 'datfrozenxid retreated' FROM pg_database WHERE datname = current_catalog AND age(datfrozenxid) > (SELECT min(age(x)) FROM frozen_witness); -?column? ----------------------- -datfrozenxid retreated -(1 row) +?column? +-------- +(0 rows) diff --git a/src/test/isolation/expected/intra-grant-inplace.out b/src/test/isolation/expected/intra-grant-inplace.out index cc1e47a..c2a9841 100644 --- a/src/test/isolation/expected/intra-grant-inplace.out +++ b/src/test/isolation/expected/intra-grant-inplace.out @@ -14,15 +14,16 @@ relhasindex f (1 row) -step addk2: ALTER TABLE intra_grant_inplace ADD PRIMARY KEY (c); +step addk2: ALTER TABLE intra_grant_inplace ADD PRIMARY KEY (c); step c1: COMMIT; +step addk2: <... completed> step read2: SELECT relhasindex FROM pg_class WHERE oid = 'intra_grant_inplace'::regclass; relhasindex ----------- -f +t (1 row) @@ -58,8 +59,9 @@ relhasindex f (1 row) -step addk2: ALTER TABLE intra_grant_inplace ADD PRIMARY KEY (c); +step addk2: ALTER TABLE intra_grant_inplace ADD PRIMARY KEY (c); step r3: ROLLBACK; +step addk2: <... completed> starting permutation: b2 sfnku2 addk2 c2 step b2: BEGIN; @@ -122,17 +124,18 @@ relhasindex f (1 row) -step addk2: ALTER TABLE intra_grant_inplace ADD PRIMARY KEY (c); +step addk2: ALTER TABLE intra_grant_inplace ADD PRIMARY KEY (c); step r3: ROLLBACK; step grant1: <... completed> step c1: COMMIT; +step addk2: <... completed> step read2: SELECT relhasindex FROM pg_class WHERE oid = 'intra_grant_inplace'::regclass; relhasindex ----------- -f +t (1 row) diff --git a/src/test/isolation/specs/intra-grant-inplace-db.spec b/src/test/isolation/specs/intra-grant-inplace-db.spec index bbecd5d..9de40ec 100644 --- a/src/test/isolation/specs/intra-grant-inplace-db.spec +++ b/src/test/isolation/specs/intra-grant-inplace-db.spec @@ -42,5 +42,4 @@ step cmp3 { } -# XXX extant bug permutation snap3 b1 grant1 vac2(c1) snap3 c1 cmp3 diff --git a/src/test/isolation/specs/intra-grant-inplace.spec b/src/test/isolation/specs/intra-grant-inplace.spec index 3cd696b..eed0b52 100644 --- a/src/test/isolation/specs/intra-grant-inplace.spec +++ b/src/test/isolation/specs/intra-grant-inplace.spec @@ -73,7 +73,7 @@ step keyshr5 { teardown { ROLLBACK; } -# XXX extant bugs: permutation comments refer to planned post-bugfix behavior +# XXX extant bugs: permutation comments refer to planned future LockTuple() permutation b1 diff --git a/src/test/modules/injection_points/expected/inplace.out b/src/test/modules/injection_points/expected/inplace.out index 123f45a..db7dab6 100644 --- a/src/test/modules/injection_points/expected/inplace.out +++ b/src/test/modules/injection_points/expected/inplace.out @@ -40,4 +40,301 @@ step read1: SELECT reltuples = -1 AS reltuples_unknown FROM pg_class WHERE oid = 'vactest.orig50'::regclass; -ERROR: could not create unique index "pg_class_oid_index" +reltuples_unknown +----------------- +f +(1 row) + + +starting permutation: begin2 grant2 vac1 c2 vac3 mkrels3 read1 +mkrels +------ + +(1 row) + +injection_points_attach +----------------------- + +(1 row) + +step begin2: BEGIN; +step grant2: GRANT SELECT ON TABLE vactest.orig50 TO PUBLIC; +step vac1: VACUUM vactest.orig50; -- wait during inplace update +step c2: COMMIT; +step vac3: VACUUM pg_class; +step mkrels3: + SELECT vactest.mkrels('intruder', 1, 100); -- repopulate LP_UNUSED + SELECT injection_points_detach('inplace-before-pin'); + SELECT injection_points_wakeup('inplace-before-pin'); + +mkrels +------ + +(1 row) + +injection_points_detach +----------------------- + +(1 row) + +injection_points_wakeup +----------------------- + +(1 row) + +step vac1: <... completed> +step read1: + REINDEX TABLE pg_class; -- look for duplicates + SELECT reltuples = -1 AS reltuples_unknown + FROM pg_class WHERE oid = 'vactest.orig50'::regclass; + +reltuples_unknown +----------------- +f +(1 row) + + +starting permutation: begin2 grant2 vac1 r2 vac3 mkrels3 read1 +mkrels +------ + +(1 row) + +injection_points_attach +----------------------- + +(1 row) + +step begin2: BEGIN; +step grant2: GRANT SELECT ON TABLE vactest.orig50 TO PUBLIC; +step vac1: VACUUM vactest.orig50; -- wait during inplace update +step r2: ROLLBACK; +step vac3: VACUUM pg_class; +step mkrels3: + SELECT vactest.mkrels('intruder', 1, 100); -- repopulate LP_UNUSED + SELECT injection_points_detach('inplace-before-pin'); + SELECT injection_points_wakeup('inplace-before-pin'); + +mkrels +------ + +(1 row) + +injection_points_detach +----------------------- + +(1 row) + +injection_points_wakeup +----------------------- + +(1 row) + +step vac1: <... completed> +step read1: + REINDEX TABLE pg_class; -- look for duplicates + SELECT reltuples = -1 AS reltuples_unknown + FROM pg_class WHERE oid = 'vactest.orig50'::regclass; + +reltuples_unknown +----------------- +f +(1 row) + + +starting permutation: begin2 grant2 vac1 c2 revoke2 grant2 vac3 mkrels3 read1 +mkrels +------ + +(1 row) + +injection_points_attach +----------------------- + +(1 row) + +step begin2: BEGIN; +step grant2: GRANT SELECT ON TABLE vactest.orig50 TO PUBLIC; +step vac1: VACUUM vactest.orig50; -- wait during inplace update +step c2: COMMIT; +step revoke2: REVOKE SELECT ON TABLE vactest.orig50 FROM PUBLIC; +step grant2: GRANT SELECT ON TABLE vactest.orig50 TO PUBLIC; +step vac3: VACUUM pg_class; +step mkrels3: + SELECT vactest.mkrels('intruder', 1, 100); -- repopulate LP_UNUSED + SELECT injection_points_detach('inplace-before-pin'); + SELECT injection_points_wakeup('inplace-before-pin'); + +mkrels +------ + +(1 row) + +injection_points_detach +----------------------- + +(1 row) + +injection_points_wakeup +----------------------- + +(1 row) + +step vac1: <... completed> +step read1: + REINDEX TABLE pg_class; -- look for duplicates + SELECT reltuples = -1 AS reltuples_unknown + FROM pg_class WHERE oid = 'vactest.orig50'::regclass; + +reltuples_unknown +----------------- +f +(1 row) + + +starting permutation: vac1 begin2 grant2 revoke2 mkrels3 c2 read1 +mkrels +------ + +(1 row) + +injection_points_attach +----------------------- + +(1 row) + +step vac1: VACUUM vactest.orig50; -- wait during inplace update +step begin2: BEGIN; +step grant2: GRANT SELECT ON TABLE vactest.orig50 TO PUBLIC; +step revoke2: REVOKE SELECT ON TABLE vactest.orig50 FROM PUBLIC; +step mkrels3: + SELECT vactest.mkrels('intruder', 1, 100); -- repopulate LP_UNUSED + SELECT injection_points_detach('inplace-before-pin'); + SELECT injection_points_wakeup('inplace-before-pin'); + +mkrels +------ + +(1 row) + +injection_points_detach +----------------------- + +(1 row) + +injection_points_wakeup +----------------------- + +(1 row) + +step c2: COMMIT; +step vac1: <... completed> +step read1: + REINDEX TABLE pg_class; -- look for duplicates + SELECT reltuples = -1 AS reltuples_unknown + FROM pg_class WHERE oid = 'vactest.orig50'::regclass; + +reltuples_unknown +----------------- +f +(1 row) + + +starting permutation: begin2 grant2 vac1 r2 grant2 revoke2 vac3 mkrels3 read1 +mkrels +------ + +(1 row) + +injection_points_attach +----------------------- + +(1 row) + +step begin2: BEGIN; +step grant2: GRANT SELECT ON TABLE vactest.orig50 TO PUBLIC; +step vac1: VACUUM vactest.orig50; -- wait during inplace update +step r2: ROLLBACK; +step grant2: GRANT SELECT ON TABLE vactest.orig50 TO PUBLIC; +step revoke2: REVOKE SELECT ON TABLE vactest.orig50 FROM PUBLIC; +step vac3: VACUUM pg_class; +step mkrels3: + SELECT vactest.mkrels('intruder', 1, 100); -- repopulate LP_UNUSED + SELECT injection_points_detach('inplace-before-pin'); + SELECT injection_points_wakeup('inplace-before-pin'); + +mkrels +------ + +(1 row) + +injection_points_detach +----------------------- + +(1 row) + +injection_points_wakeup +----------------------- + +(1 row) + +step vac1: <... completed> +step read1: + REINDEX TABLE pg_class; -- look for duplicates + SELECT reltuples = -1 AS reltuples_unknown + FROM pg_class WHERE oid = 'vactest.orig50'::regclass; + +reltuples_unknown +----------------- +f +(1 row) + + +starting permutation: begin2 grant2 vac1 c2 revoke2 vac3 mkrels3 read1 +mkrels +------ + +(1 row) + +injection_points_attach +----------------------- + +(1 row) + +step begin2: BEGIN; +step grant2: GRANT SELECT ON TABLE vactest.orig50 TO PUBLIC; +step vac1: VACUUM vactest.orig50; -- wait during inplace update +step c2: COMMIT; +step revoke2: REVOKE SELECT ON TABLE vactest.orig50 FROM PUBLIC; +step vac3: VACUUM pg_class; +step mkrels3: + SELECT vactest.mkrels('intruder', 1, 100); -- repopulate LP_UNUSED + SELECT injection_points_detach('inplace-before-pin'); + SELECT injection_points_wakeup('inplace-before-pin'); + +mkrels +------ + +(1 row) + +injection_points_detach +----------------------- + +(1 row) + +injection_points_wakeup +----------------------- + +(1 row) + +step vac1: <... completed> +step read1: + REINDEX TABLE pg_class; -- look for duplicates + SELECT reltuples = -1 AS reltuples_unknown + FROM pg_class WHERE oid = 'vactest.orig50'::regclass; + +reltuples_unknown +----------------- +f +(1 row) + diff --git a/src/test/modules/injection_points/specs/inplace.spec b/src/test/modules/injection_points/specs/inplace.spec index e957713..86539a5 100644 --- a/src/test/modules/injection_points/specs/inplace.spec +++ b/src/test/modules/injection_points/specs/inplace.spec @@ -32,12 +32,9 @@ setup CREATE TABLE vactest.orig50 (); SELECT vactest.mkrels('orig', 51, 100); } - -# XXX DROP causes an assertion failure; adopt DROP once fixed teardown { - --DROP SCHEMA vactest CASCADE; - DO $$BEGIN EXECUTE 'ALTER SCHEMA vactest RENAME TO schema' || oid FROM pg_namespace where nspname = 'vactest'; END$$; + DROP SCHEMA vactest CASCADE; DROP EXTENSION injection_points; } @@ -56,11 +53,13 @@ step read1 { FROM pg_class WHERE oid = 'vactest.orig50'::regclass; } - # Transactional updates of the tuple vac1 is waiting to inplace-update. session s2 step grant2 { GRANT SELECT ON TABLE vactest.orig50 TO PUBLIC; } - +step revoke2 { REVOKE SELECT ON TABLE vactest.orig50 FROM PUBLIC; } +step begin2 { BEGIN; } +step c2 { COMMIT; } +step r2 { ROLLBACK; } # Non-blocking actions. session s3 @@ -74,10 +73,69 @@ step mkrels3 { } -# XXX extant bug +# target gains a successor at the last moment permutation vac1(mkrels3) # reads pg_class tuple T0 for vactest.orig50, xmax invalid grant2 # T0 becomes eligible for pruning, T1 is successor vac3 # T0 becomes LP_UNUSED - mkrels3 # T0 reused; vac1 wakes and overwrites the reused T0 + mkrels3 # vac1 wakes, scans to T1 read1 + +# target already has a successor, which commits +permutation + begin2 + grant2 # T0.t_ctid = T1 + vac1(mkrels3) # reads T0 for vactest.orig50 + c2 # T0 becomes eligible for pruning + vac3 # T0 becomes LP_UNUSED + mkrels3 # vac1 wakes, scans to T1 + read1 + +# target already has a successor, which becomes LP_UNUSED at the last moment +permutation + begin2 + grant2 # T0.t_ctid = T1 + vac1(mkrels3) # reads T0 for vactest.orig50 + r2 # T1 becomes eligible for pruning + vac3 # T1 becomes LP_UNUSED + mkrels3 # reuse T1; vac1 scans to T0 + read1 + +# target already has a successor, which becomes LP_REDIRECT at the last moment +permutation + begin2 + grant2 # T0.t_ctid = T1, non-HOT due to filled page + vac1(mkrels3) # reads T0 + c2 + revoke2 # HOT update to T2 + grant2 # HOT update to T3 + vac3 # T1 becomes LP_REDIRECT + mkrels3 # reuse T2; vac1 scans to T3 + read1 + +# waiting for updater to end +permutation + vac1(c2) # reads pg_class tuple T0 for vactest.orig50, xmax invalid + begin2 + grant2 # T0.t_ctid = T1, non-HOT due to filled page + revoke2 # HOT update to T2 + mkrels3 # vac1 awakes briefly, then waits for s2 + c2 + read1 + +# Another LP_UNUSED. This time, do change the live tuple. Final live tuple +# body is identical to original, at a different TID. +permutation + begin2 + grant2 # T0.t_ctid = T1, non-HOT due to filled page + vac1(mkrels3) # reads T0 + r2 # T1 becomes eligible for pruning + grant2 # T0.t_ctid = T2; T0 becomes eligible for pruning + revoke2 # T2.t_ctid = T3; T2 becomes eligible for pruning + vac3 # T0, T1 & T2 become LP_UNUSED + mkrels3 # reuse T0, T1 & T2; vac1 scans to T3 + read1 + +# Another LP_REDIRECT. Compared to the earlier test, omit the last grant2. +# Hence, final live tuple body is identical to original, at a different TID. +permutation begin2 grant2 vac1(mkrels3) c2 revoke2 vac3 mkrels3 read1