Quick Links

Looking for settings/configuration for FASTEST reindex on idle system.

From:	Jeff Amiel <becauseimjeff(at)yahoo(dot)com>
To:	"pgsql-general(at)postgresql(dot)org" <pgsql-general(at)postgresql(dot)org>
Subject:	Looking for settings/configuration for FASTEST reindex on idle system.
Date:	2014-01-09 22:03:25
Message-ID:	1389305005.50135.YahooMailNeo@web122901.mail.ne1.yahoo.com
Views:	Whole Thread \| Raw Message \| Download mbox \| Resend email
Thread:
Lists:	pgsql-general

I have a maintenance window coming up and using pg_upgrade to upgrade from 9.2.X to 9.3.X.
As part of the window, I’d like to ‘cluster’ each table by its primary key. After doing so, I see amazing performance improvements (probably mostly because of index bloat - but possibly due to table fragmentation)

That being said, I have a single table that is blowing my window -
at 140 million rows (28 gig in size with 75 gig worth of indexes), this bad boy is my white whale. There are 10 indexes (not including the primary key). Yes - 10 is a lot - but I’ve been monitoring their use (most are single column or partial indexes) and all are used.

That being said, I’ve been reading and experimenting in trying to get a cluster of this table (which re-indexes all 10/11 indexes) to complete in a reasonable amount of time.

There are lots of settings and ranges to chose from and while my experiments continue, I was looking to get some input. Lowest I have gotten for clustering this table is just under 6 hours.

I am familiar with pg_reorg and it’s sibling pg_repack - but they call the base postgresql reindex functions underneath - and I have learned by using ‘verbose’ that the actual clustering of the table is quick - it’s the reindexing that is slow (It’s doing each reindex sequentially instead of concurently)

PostgreSQL 9.3.2 on x86_64-pc-solaris2.11, compiled by gcc (GCC) 4.5.2, 64-bit
500 gig of ram
2.7gig processors (48 cores)
Shared buffers set to 120gig
Maintenance work men set to 1gig
work men set to 500 meg

Things I have read/seen/been told to tweak…

fsync (set to off)
setting wal_level to minimal (to avoid wal logging of cluster activity)
bumping up maintenance work men (but I’ve also seen/read that uber high values cause disk based sorts which ultimately slow things down)
Tweaking checkpoint settings (although with wal_level set to minimal - I don’t think it comes into play)

any good suggestions for lighting a fire under this process?

If worse comes to worse, I can vacuum full the table and reindex each index concurrently - but it won’t give me the benefit of having the tuples ordered by their oft-grouped primary key.

Responses

Re: Looking for settings/configuration for FASTEST reindex on idle system. at 2014-01-09 23:22:25 from Jeff Amiel
Re: Looking for settings/configuration for FASTEST reindex on idle system. at 2014-01-10 05:37:50 from Sergey Konoplev
Re: Looking for settings/configuration for FASTEST reindex on idle system. at 2014-01-16 22:55:32 from Jeff Janes

Browse pgsql-general by date

	From	Date	Subject
Next Message	Adrian Klaver	2014-01-09 23:09:24	Re: pg_restore - selective restore use cases. HINT use DROP CASCADE
Previous Message	Day, David	2014-01-09 21:51:39	Re: pg_restore - selective restore use cases. HINT use DROP CASCADE