Re: create index concurrently - duplicate index to reduce time without an index

From: <Gareth(dot)Williams(at)csiro(dot)au>
To: <greg(at)2ndquadrant(dot)com>
Cc: <mabewlun(at)gmail(dot)com>, <pgsql-general(at)postgresql(dot)org>
Subject: Re: create index concurrently - duplicate index to reduce time without an index
Date: 2010-06-03 23:43:18
Message-ID: 007DECE986B47F4EABF823C1FBB19C6201026E19AF06@exvic-mbx04.nexus.csiro.au
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-general

> -----Original Message-----
> From: Greg Smith [mailto:greg(at)2ndquadrant(dot)com]
-snip-
>
> Gareth.Williams wrote:
> > So the rest of the question is, if I have two indexes with identical
> definitions, what happens? I've confirmed that I can create indexes with
> identical definitions (except name) without postgres complaining - and
> without breaking the client on my test system - but I am wary of trying it
> on my production system where there is much more data (8GB) and I care
> about it's integrity so much more.
> >
>
> The database doesn't care one bit if you create a bunch of redundant
> indexes. So long as one of them is around to satisfy the queries that
> need the index to run well, you're fine.
>
> The main thing you can't do with the index concurrently/rename shuffle
> you've discovered here is use that approach to concurrently rebuild an
> index that enforces a constraint or unique index. If your index is
> enforcing a PRIMARY KEY for example, you'll discover a major roadblock
> were you to try this same technique to rebuild it. Those are tied into
> the constraint portion of the system catalogs and manipulating them
> isn't so easy.
>
> Regular indexes that exist just to speed up queries, those you can
> rename around as you've been figuring out without any downside. From a
> general paranoia perspective, you should run an explicit ANALYZE on the
> underlying table after you finish the shuffle, just to make absolutely
> sure the right statistics are available afterwards.
>
> --
> Greg Smith 2ndQuadrant US Baltimore, MD
> PostgreSQL Training, Services and Support
> greg(at)2ndQuadrant(dot)com www.2ndQuadrant.us

Thanks Greg, Alban and others,

This has cleared up a misunderstanding I had about why one should reindex. Re-reading the documentation http://www.postgresql.org/docs/8.4/interactive/sql-reindex.html it is clear now that reindex or recreating and index should not normally be needed - certainly not to keep an index up-to-date. I would have guessed that VACUUM or VACUUM ANALYSE on the table that the index is associated would have been sufficient to reclaim space for a 'bloated' index (maybe only VACUUM FULL would help). In any case we can leave reindexing or full vacuum for outages where we are interrupting service anyway.

I was heartened by the responses and tried further testing (if it could not hurt, why not try and see if it could be faster), but ran into a problem. A few times when I was trying to drop an index (before or after creating a duplicate index with 'concurrently'), the dropping of the index stalled. It seems that this was because of existing connection:
postgres: rods ICAT 130.102.163.141(58061) idle in transaction
And new clients block.

Is that expected? Should we look for 'bugs' in out client that seems to be holding a connection?

For the application we have, I'm ready to give up on this train of investigation for optimization and just vacuum analyse key tables regularly and vaccuum and maybe reindex more completely during outages - though we aim for outages to be infrequent. The database holds data representing a virtual filesystem structure with millions of file (and associated access controls, and information on underlying storage resources and replication). There is probably not much update or delete of the main data - at least compared with the total holdings and the new data/files which are regularly being added to the system.

Thanks again,

Gareth

Ps. Greg, I don't understand the issue with 'concurrently rebuild (of) an index that enforces a constraint or unique index'. I don't think I care much right at the moment, but I'm generally interested and others might be too. Would you expect the create index to fail or to cause locking or just transient performance degradation?

In response to

Responses

Browse pgsql-general by date

  From Date Subject
Next Message Craig Ringer 2010-06-04 00:43:02 Re: How to remove the current database and populate the database with new data?
Previous Message zhong ming wu 2010-06-03 23:30:47 Re: c program fails to run with the postgres which is installed at user location