Quick Links

Re: Support for REINDEX CONCURRENTLY

From:	Andres Freund <andres(at)2ndquadrant(dot)com>
To:	Michael Paquier <michael(dot)paquier(at)gmail(dot)com>
Cc:	"pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org>, Peter Eisentraut <peter_e(at)gmx(dot)net>, Fujii Masao <masao(dot)fujii(at)gmail(dot)com>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Simon Riggs <simon(at)2ndquadrant(dot)com>
Subject:	Re: Support for REINDEX CONCURRENTLY
Date:	2013-03-05 16:49:51
Message-ID:	20130305164951.GG13803@alap2.anarazel.de
Views:	Whole Thread \| Raw Message \| Download mbox \| Resend email
Thread:
Lists:	pgsql-hackers

On 2013-03-05 22:35:16 +0900, Michael Paquier wrote:
> Thanks for the review. All your comments are addressed and updated patches
> are attached.
> Please see below for the details, and if you find anything else just let me
> know.
>
> On Tue, Mar 5, 2013 at 6:27 PM, Andres Freund <andres(at)2ndquadrant(dot)com>wrote:
>
> > Have you benchmarked the toastrelidx removal stuff in any way? If not,
> > thats fine, but if yes I'd be interested.
> >
> No I haven't. Is it really that easily measurable? I think not, but me too
> I'd be interested in looking at such results.

I don't think its really measurable, at least not for modifications. But
istm that the onus to proof that to some degree is upon the patch.

> > + if (toastrel->rd_indexvalid == 0)
> > > + RelationGetIndexList(toastrel);
> >
> > Hm, I think we should move this into a macro, this is cropping up at
> > more and more places.
> >
> This is not necessary. RelationGetIndexList does a check similar at its
> top, so I simply removed all those checks.

Well, in some of those cases a function call might be noticeable
(probably only in the toast fetch path). Thats why I suggested putting
the above in a macro...

>
> > > + for (count = 0; count < num_indexes; count++)
> > > + index_insert(toastidxs[count], t_values, t_isnull,
> > > + &(toasttup->t_self),
> > > + toastrel,
> > > +
> > toastidxs[count]->rd_index->indisunique ?
> > > + UNIQUE_CHECK_YES :
> > UNIQUE_CHECK_NO);
> >
> > The indisunique check looks like a copy & pasto to me, albeit not
> > yours...
> >
> Yes it is the same for all the indexes normally, but it looks more solid to
> me to do that as it is. So unchanged.

Hm, if the toast indexes aren't unique anymore loads of stuff would be
broken. Anyway, not your "fault".

> >
> > > + /* Obtain index list if necessary */
> > > + if (toastRel1->rd_indexvalid == 0)
> > > + RelationGetIndexList(toastRel1);
> > > + if (toastRel2->rd_indexvalid == 0)
> > > + RelationGetIndexList(toastRel2);
> > > +
> > > + /* Check if the swap is possible for all the toast indexes
> > */
> >
> > So there's no error being thrown if this turns out not to be possible?
> >
> There are no errors also in the former process... This should fail
> silently, no?

Not sure what you mean by "former process"? So far I don't see any
reason why it would be a good idea to fail silently. We end up with
corrupt data if the swap is silently not performed.

> > > + if (count == 0)
> > > + snprintf(NewToastName,
> > NAMEDATALEN, "pg_toast_%u_index",
> > > + OIDOldHeap);
> > > + else
> > > + snprintf(NewToastName,
> > NAMEDATALEN, "pg_toast_%u_index_cct%d",
> > > + OIDOldHeap,
> > count);
> > > + RenameRelationInternal(lfirst_oid(lc),
> > > +
> > NewToastName);
> > > + count++;
> > > + }
> >
> > Hm. It seems wrong that this layer needs to know about _cct.
> >
> Any other idea? For the time being I removed cct and added only a suffix
> based on the index number...

Hm. It seems like throwing an error would be sufficient, that path is
only entered for shared catalogs, right? Having multiple toast indexes
would be a bug.

> > > + /*
> > > + * Index is considered as a constraint if it is PRIMARY KEY or
> > EXCLUSION.
> > > + */
> > > + isconstraint = indexRelation->rd_index->indisprimary ||
> > > + indexRelation->rd_index->indisexclusion;
> >
> > unique constraints aren't mattering here?
> >
> No they are not. Unique indexes are not counted as constraints in the case
> of index_create. Previous versions of the patch did that but there are
> issues with unique indexes using expressions.

Hm. index_create's comment says:
* isconstraint: index is owned by PRIMARY KEY, UNIQUE, or EXCLUSION constraint

There are unique indexes that are constraints and some that are
not. Looking at ->indisunique is not sufficient to determine whether its
one or not.

> > We probably should remove the fsm of the index altogether after this?
> >
> The freespace map? Not sure it is necessary here. Isn't it going to be
> removed with the relation anyway?

I had a thinko here, forgot what I said. I thought the freespacemap
would be the one from the old index, but htats clearly bogus. Comes from
writing reviews after having to leave home at 5 in the morning to catch
a plane ;)

> > > +void
> > > +index_concurrent_drop(Oid indexOid)
> > > +{
> > > + Oid constraintOid =
> > get_index_constraint(indexOid);
> > > + ObjectAddress object;
> > > + Form_pg_index indexForm;
> > > + Relation pg_index;
> > > + HeapTuple indexTuple;
> > > + bool indislive;
> > > +
> > > + /*
> > > + * Check that the index dropped here is not alive, it might be
> > used by
> > > + * other backends in this case.
> > > + */
> > > + pg_index = heap_open(IndexRelationId, RowExclusiveLock);
> > > +
> > > + indexTuple = SearchSysCacheCopy1(INDEXRELID,
> > > +
> > ObjectIdGetDatum(indexOid));
> > > + if (!HeapTupleIsValid(indexTuple))
> > > + elog(ERROR, "cache lookup failed for index %u", indexOid);
> > > + indexForm = (Form_pg_index) GETSTRUCT(indexTuple);
> > > + indislive = indexForm->indislive;
> > > +
> > > + /* Clean up */
> > > + heap_close(pg_index, RowExclusiveLock);
> > > +
> > > + /* Leave if index is still alive */
> > > + if (indislive)
> > > + return;
> >
> > This seems like a confusing path? Why is it valid to get here with a
> > valid index and why is it ok to silently ignore that case?
> >
> I added that because of a comment of one of the past reviews. Personally I
> think it makes more sense to remove that for clarity.

Imo it should be an elog(ERROR) or an Assert().

>
> > + case RELKIND_RELATION:
> > > + {
> > > + /*
> > > + * In the case of a relation, find all its
> > indexes
> > > + * including toast indexes.
> > > + */
> > > + Relation heapRelation =
> > heap_open(relationOid,
> > > +
> > ShareUpdateExclusiveLock);
> >
> > Hm. This means we will not notice having about-to-be dropped indexes
> > around. Which seems safe because locks will prevent that anyway...
> >
> I think that's OK as-is.

Yes. Just thinking out loud.

> > + default:
> > > + /* nothing to do */
> > > + break;
> >
> > Shouldn't we error out?
> >
> Don't think so. For example what if the relation is a matview? For REINDEX
> DATABASE this could finish as an error because a materialized view is
> listed as a relation to reindex. I prefer having this path failing silently
> and leave if there are no indexes.

Imo default fallthroughs makes it harder to adjust code. And afaik its
legal to add indexes to materialized views which kinda proofs my point.
And if that path is reached for plain views, sequences or toast tables
its an error.

> > > + /*
> > > + * Phase 3 of REINDEX CONCURRENTLY
> > > + *
> > > + * During this phase the concurrent indexes catch up with the
> > INSERT that
> > > + * might have occurred in the parent table and are marked as valid
> > once done.
> > > + *
> > > + * We once again wait until no transaction can have the table open
> > with
> > > + * the index marked as read-only for updates. Each index
> > validation is done
> > > + * with a separate transaction to avoid opening transaction for an
> > > + * unnecessary too long time.
> > > + */
> >
> > Maybe I am being dumb because I have the feeling I said differently in
> > the past, but why do we not need a WaitForMultipleVirtualLocks() here?
> > The comment seems to say we need to do so.
> >
> Yes you said the contrary in a previous review. The purpose of this
> function is to first gather the locks and then wait for everything at once
> to reduce possible conflicts.

you say:

+ * We once again wait until no transaction can have the table open with
+ * the index marked as read-only for updates. Each index validation is done
+ * with a separate transaction to avoid opening transaction for an
+ * unnecessary too long time.

Which doesn't seem to be done?

I read back and afaics I only referred to CacheInvalidateRelcacheByRelid
not being necessary in this phase. Which I think is correct. Anyway, if
I claimed otherwise, I think I was wrong:

The reason - I think - we need to wait here is that otherwise its not
guaranteed that all other backends see the index with ->isready
set. Which means they might add tuples which are invisible to the mvcc
snapshot passed to validate_index() (just created beforehand) which are
not yet added to the new index because those backends think the index is
not ready yet.
Any flaws in that logic?

...

Yes, reading the comments of validate_index() and the old implementation
seems to make my point.

> > > + /*
> > > + * Phase 6 of REINDEX CONCURRENTLY
> > > + *
> > > + * Drop the concurrent indexes. This needs to be done through
> > > + * performDeletion or related dependencies will not be dropped for
> > the old
> > > + * indexes. The internal mechanism of DROP INDEX CONCURRENTLY is
> > not used
> > > + * as here the indexes are already considered as dead and invalid,
> > so they
> > > + * will not be used by other backends.
> > > + */
> > > + foreach(lc, concurrentIndexIds)
> > > + {
> > > + Oid indexOid = lfirst_oid(lc);
> > > +
> > > + /* Start transaction to drop this index */
> > > + StartTransactionCommand();
> > > +
> > > + /* Get fresh snapshot for next step */
> > > + PushActiveSnapshot(GetTransactionSnapshot());
> > > +
> > > + /*
> > > + * Open transaction if necessary, for the first index
> > treated its
> > > + * transaction has been already opened previously.
> > > + */
> > > + index_concurrent_drop(indexOid);
> > > +
> > > + /*
> > > + * For the last index to be treated, do not commit
> > transaction yet.
> > > + * This will be done once all the locks on indexes and
> > parent relations
> > > + * are released.
> > > + */
> >
> > Hm. This doesn't seem to commit the last transaction at all right now?
> >
> It is better like this. The end of the process needs to be done inside a
> transaction, so not committing immediately the last drop makes sense, no?

I pretty much dislike this. If we need to leave a transaction open
(why?), that should happen a function layer above.

>
>
> > Not sure why UnlockRelationIdForSession needs to be run in a transaction
> > anyway?
> >
> Even in the case of CREATE INDEX CONCURRENTLY, UnlockRelationIdForSession
> is run inside a transaction block.

I have no problem of doing so, I just dislike the way thats done in the
loop. You can just open a new one if its required, a transaction is
cheap, especially if it doesn't even acquire an xid.

Looking good.

I'll do some actual testing instead of just reviewing now...

Greetings,

Andres Freund

In response to

Re: Support for REINDEX CONCURRENTLY at 2013-03-05 13:35:16 from Michael Paquier

Responses

Re: Support for REINDEX CONCURRENTLY at 2013-03-06 04:21:27 from Michael Paquier

Browse pgsql-hackers by date

	From	Date	Subject
Next Message	Alvaro Herrera	2013-03-05 17:45:08	Re: sql_drop Event Triggerg
Previous Message	Maciek Sakrejda	2013-03-05 16:37:58	Re: [HACKERS] Floating point error