Re: [HACKERS] Commits 8de72b and 5457a1 (COPY FREEZE)

From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Andres Freund <andres(at)anarazel(dot)de>
Cc: Simon Riggs <simon(at)2ndQuadrant(dot)com>, Noah Misch <noah(at)leadboat(dot)com>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: [HACKERS] Commits 8de72b and 5457a1 (COPY FREEZE)
Date: 2019-05-01 23:01:23
Message-ID: 13793.1556751683@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Andres Freund <andres(at)anarazel(dot)de> writes:
> On 2019-05-01 14:44:12 -0400, Tom Lane wrote:
>> I'm busy looking into the REINDEX-on-pg_class mess, and one thing I found
>> while testing HEAD is that with CLOBBER_CACHE_ALWAYS on, once you've
>> gotten a failure, your session is hosed:
>>
>> regression=# select 1;
>> ?column?
>> ----------
>> 1
>> (1 row)
>>
>> regression=# reindex index pg_class_oid_index;
>> psql: ERROR: could not read block 0 in file "base/16384/35344": read only 0 of 8192 bytes
>> regression=# select 1;
>> psql: ERROR: could not open file "base/16384/35344": No such file or directory

> Yea, I noticed that too. Lead me astray for a while, because it
> triggered apparent REINDEX failures for pg_index too, even though not
> actually related to the reindex.

It seems that the reason we fail to recover is that we detect the error
during CommandEndInvalidationMessages, after we've updated the relcache
entry for pg_class_oid_index; of course at that point, any additional
pg_class access is going to fail because it'll try to use the
not-yet-existent index. However, when we then abort the transaction,
we fail to revert pg_class_oid_index's relcache entry because *there is
not yet an invalidation queue entry telling us to do so*.

This is, therefore, an order-of-operations bug in
CommandEndInvalidationMessages. It should attach the "current command"
queue entries to PriorCmdInvalidMsgs *before* it processes them, not
after. In this way they'll be available to be reprocessed by
AtEOXact_Inval or AtEOSubXact_Inval if we fail during CCI.

(A different way we could attack this is to make AtEOXact_Inval and
AtEOSubXact_Inval process "current" messages, as they do not today.
But I think that's a relatively inefficient solution, as it would
force those messages to be reprocessed even in the much-more-common
case where we abort before reaching CommandEndInvalidationMessages.)

The attached quick-hack patch changes that around, and seems to survive
testing (though I've not completed a CCA run with it). The basic change
I had to make to make this work is to make AppendInvalidationMessageList
actually append the current-commands list to the prior-cmds list, not
prepend as it's really always done. (In that way, we can still scan the
current-commands list just by starting at its head.) The comments for
struct InvalidationChunk explicitly disclaim any significance to the order
of the entries, so this should be okay, and I'm not seeing any problem
in testing. It's been a long time since I wrote that code, but I think
the reason I did it like that was the thought that in a long transaction,
the prior-cmds list might be much longer than the current-commands list,
so iterating down the prior-cmds list could add an O(N^2) cost. The
easy answer to that, of course, is to make the lists doubly-linked,
and I'd do that before committing; this version is just for testing.
(It's short on comments, too.)

The thing I was worried about in RelationCacheInvalidate does seem
to be a red herring, at least fixing it is not necessary to make
the broken-session-state problem go away.

Comments?

regards, tom lane

Attachment Content-Type Size
wip-fix-for-error-during-CCI.patch text/x-diff 3.4 KB

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Tom Lane 2019-05-01 23:41:24 Re: REINDEX INDEX results in a crash for an index of pg_class since 9.6
Previous Message Arthur Zakirov 2019-05-01 22:03:38 Re: to_timestamp docs