From: | Andres Freund <andres(at)2ndquadrant(dot)com> |
---|---|
To: | Michael Paquier <michael(dot)paquier(at)gmail(dot)com> |
Cc: | Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Simon Riggs <simon(at)2ndquadrant(dot)com>, "pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org> |
Subject: | Re: Support for REINDEX CONCURRENTLY |
Date: | 2013-01-28 10:39:16 |
Message-ID: | 20130128103916.GA4268@awork2.anarazel.de |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
On 2013-01-27 07:54:43 +0900, Michael Paquier wrote:
> On Sun, Jan 27, 2013 at 1:52 AM, Andres Freund <andres(at)2ndquadrant(dot)com>wrote:
> > On 2013-01-25 13:48:50 +0900, Michael Paquier wrote:
> > > > As far as I understand that code its purpose is to enforce that all
> > > > potential users have an up2date definition available. For that we
> > > > acquire a lock on all virtualxids of users using that table thus waiting
> > > > for them to finish.
> > > > Consider the scenario where you have a workload where most transactions
> > > > are fairly long (say 10min) and use the same tables (a,b)/indexes(a_1,
> > > > a_2, b_1, b_2). With the current strategy you will do:
> > > >
> > > > WaitForVirtualLocks(a_1) -- wait up to 10min
> > > > index_build(a_1)
> > > > WaitForVirtualLocks(a_2) -- wait up to 10min
> > > > index_build(a_2)
> > > >
> > > ...
> > > >
> > > > So instead of waiting up 10 minutes for that phase you have to wait up
> > > > to 40.
> > > >
> > > This is necessary if you want to process each index entry in a different
> > > transaction as WaitForVirtualLocks needs to wait for the locks held on the
> > > parent table. If you want to fo this wait once per transaction, the
> > > solution would be to group the index builds in the same transaction for all
> > > the indexes of the relation. One index per transaction looks more solid in
> > > this case if there is a failure during a process only one index will be
> > > incorrectly built.
> >
> > I cannot really follow you here.
> >
> OK let's be more explicit...
> > The reason why we need to wait here is
> > *only* to make sure that nobody still has the old list of indexes
> > around (which probably could even be relaxed for reindex concurrently,
> > but thats a separate optimization).
> >
> In order to do that, you need to wait for the *parent relations* and not
> the index themselves, no?
> Based on 2 facts:
> - each index build is done in a single transaction
> - a wait needs to be done on the parent relation before each transaction
> You need to wait for the parent relation multiple times depending on the
> number of indexes in it. You could optimize that by building all the
> indexes of the *same parent relation* in a single transaction.
I think youre misunderstanding how this part works a bit. We don't
acquire locks on the table itself, but we get a list of all transactions
we would conflict with if we were to acquire a lock of a certain
strength on the table (GetLockConflicts(locktag, mode)). We then wait
for each transaction in the resulting list via the VirtualXact mechanism
(VirtualXactLock(*lockholder)).
It doesn't matter all that waiting happens in the same transaction the
initial index build is done in as long as we keep the session locks
preventing other schema modifications. Nobody can go back and see an
older index list after we've done the above wait once.
So the following should be perfectly fine:
StartTransactionCommand();
BuildListOfIndexes();
foreach(index in indexes)
DefineNewIndex(index);
CommitTransactionCommand();
StartTransactionCommand();
foreach(table in tables)
GetLockConflicts()
foreach(conflict in conflicts)
VirtualXactLocks()
CommitTransactionCommand();
foreach(index in indexes)
StartTransactionCommand();
InitialIndexBuild(index)
CommitTransactionCommand();
...
> It looks that this feature has still too many disadvantages compared to the
> advantages it could bring in the current infrastructure (SnapshotNow
> problems, what to do with invalid toast indexes, etc.), so I would tend to
> agree with Tom and postpone this feature once infrastructure is more
> mature, one of the main things being the non-MVCC'ed catalogs.
I think while catalog mvcc snapshots would make this easier, most
problems, basically all but the switching of relations, are pretty much
independent from that fact. All the waiting etc, will still be there.
I can see an argument for pushing it to the next CF because its not
really there yet...
Greetings,
Andres Freund
--
Andres Freund http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services
From | Date | Subject | |
---|---|---|---|
Next Message | Andres Freund | 2013-01-28 10:44:36 | Re: logical changeset generation v4 - Heikki's thoughts about the patch state |
Previous Message | Heikki Linnakangas | 2013-01-28 09:59:52 | Re: logical changeset generation v4 - Heikki's thoughts about the patch state |