Re: Index corruption with CREATE INDEX CONCURRENTLY

From: Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>
To: Alvaro Herrera <alvherre(at)2ndquadrant(dot)com>
Cc: Pavan Deolasee <pavan(dot)deolasee(at)gmail(dot)com>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Index corruption with CREATE INDEX CONCURRENTLY
Date: 2017-02-04 06:40:14
Message-ID: CAA4eK1Lh8bQVizws36sD=r1whxtvuaRc2YhUbqV1aEwwrqiD8Q@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Sat, Feb 4, 2017 at 12:12 AM, Alvaro Herrera
<alvherre(at)2ndquadrant(dot)com> wrote:
> Pavan Deolasee wrote:
>
>> Looking at the history and some past discussions, it seems Tomas reported
>> somewhat similar problem and Andres proposed a patch here
>> https://www.postgresql.org/message-id/20140514155204.GE23943@awork2.anarazel.de
>> which got committed via b23b0f5588d2e2. Not exactly the same issue, but
>> related to relcache flush happening in index_open().
>>
>> I wonder if we should just add a recheck logic
>> in RelationGetIndexAttrBitmap() such that if rd_indexvalid is seen as 0 at
>> the end, we go back and start again from RelationGetIndexList(). Or is
>> there any code path that relies on finding the old information?
>
> Good find on that old report. It occured to me to re-run Tomas' test
> case with this second patch you proposed (the "ddl" test takes 11
> minutes in my laptop), and lo and behold -- your "recheck" code enters
> an infinite loop. Not good. I tried some more tricks that came to
> mind, but I didn't get any good results. Pavan and I discussed it
> further and he came up with the idea of returning the bitmapset we
> compute, but if an invalidation occurs in index_open, simply do not
> cache the bitmapsets so that next time this is called they are all
> computed again. Patch attached.
>
> This appears to have appeased both test_decoding's ddl test under
> CACHE_CLOBBER_ALWAYS, as well as the CREATE INDEX CONCURRENTLY bug.
>
> I intend to commit this soon to all branches, to ensure it gets into the
> next set of minors.
>

Thanks for detailed explanation, using steps mentioned in your mail
and Pavan's previous mail steps, I could reproduce the problem.

Regarding your fix:

+ * Now save copies of the bitmaps in the relcache entry, but only if no
+ * invalidation occured in the meantime. We intentionally set rd_indexattr
+ * last, because that's the one that signals validity of the values; if we
+ * run out of memory before making that copy, we won't leave the relcache
+ * entry looking like the other ones are valid but empty.
*/
- oldcxt = MemoryContextSwitchTo(CacheMemoryContext);
- relation->rd_keyattr = bms_copy(uindexattrs);
- relation->rd_pkattr = bms_copy(pkindexattrs);
- relation->rd_idattr = bms_copy(idindexattrs);
- relation->rd_indexattr = bms_copy(indexattrs);
- MemoryContextSwitchTo(oldcxt);
+ if (relation->rd_indexvalid != 0)
+ {
+ MemoryContext oldcxt;
+
+ oldcxt = MemoryContextSwitchTo(CacheMemoryContext);
+ relation->rd_keyattr = bms_copy(uindexattrs);
+ relation->rd_pkattr = bms_copy(pkindexattrs);
+ relation->rd_idattr = bms_copy(idindexattrs);
+ relation->rd_indexattr = bms_copy(indexattrs);
+ MemoryContextSwitchTo(oldcxt);
+ }

If we do above, then I think primary key attrs won't be returned
because for those we are using relation copy rather than an original
working copy of attrs. See code below:

switch (attrKind)
{
..
case INDEX_ATTR_BITMAP_PRIMARY_KEY:
return bms_copy(relation->rd_pkattr);

Apart from above, I think after the proposed patch, it will be quite
possible that in heap_update(), hot_attrs, key_attrs and id_attrs are
computed using different index lists (consider relcache flush happens
after computation of hot_attrs or key_attrs) which was previously not
possible. I think in the later code we assume that all the three
should be computed using the same indexlist. For ex. see the below
code:

HeapSatisfiesHOTandKeyUpdate()
{
..
Assert(bms_is_subset(id_attrs, key_attrs));
Assert(bms_is_subset(key_attrs, hot_attrs));
..
}

Also below comments in heap_update indicate that we shouldn't worry
about relcache flush between the computation of hot_attrs, key_attrs
and id_attrs.

heap_update()
{
..

* Note that we get a copy here, so we need not worry about relcache flush
* happening midway through.
*/

hot_attrs = RelationGetIndexAttrBitmap(relation, INDEX_ATTR_BITMAP_ALL);
key_attrs = RelationGetIndexAttrBitmap(relation, INDEX_ATTR_BITMAP_KEY);
id_attrs = RelationGetIndexAttrBitmap(relation,
INDEX_ATTR_BITMAP_IDENTITY_KEY);
..
}

Typo.
/occured/occurred

--
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Konstantin Knizhnik 2017-02-04 07:19:33 Re: logical decoding of two-phase transactions
Previous Message Shinoda, Noriyoshi 2017-02-04 05:50:07 pg_sequences bug ?