Looking for settings/configuration for FASTEST reindex on idle system.

From: Jeff Amiel <becauseimjeff(at)yahoo(dot)com>
To: "pgsql-general(at)postgresql(dot)org" <pgsql-general(at)postgresql(dot)org>
Subject: Looking for settings/configuration for FASTEST reindex on idle system.
Date: 2014-01-09 22:03:25
Message-ID: 1389305005.50135.YahooMailNeo@web122901.mail.ne1.yahoo.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-general

I have a maintenance window coming up and using pg_upgrade to upgrade from 9.2.X to 9.3.X.
As part of the window, I’d like to ‘cluster’ each table by its primary key.  After doing so, I see amazing performance improvements (probably mostly because of index bloat - but possibly due to table fragmentation)

That being said, I have a single table that is blowing my window -
at 140 million rows (28 gig in size with 75 gig worth of indexes), this bad boy is my white whale. There are 10 indexes (not including the primary key).  Yes - 10 is a lot - but I’ve been monitoring their use (most are single column or partial indexes) and all are used.

That being said, I’ve been reading and experimenting in trying to get a cluster of this table (which re-indexes all 10/11 indexes) to complete in a reasonable amount of time.

There are lots of settings and ranges to chose from and while my experiments continue, I was looking to get some input.  Lowest I have gotten for clustering this table is just under 6 hours. 

I am familiar with pg_reorg and it’s sibling pg_repack - but they call the base postgresql reindex functions underneath - and I have learned by using ‘verbose’ that the actual clustering of the table is quick - it’s the reindexing that is slow (It’s doing each reindex sequentially instead of concurently)

PostgreSQL 9.3.2 on x86_64-pc-solaris2.11, compiled by gcc (GCC) 4.5.2, 64-bit
500 gig of ram
2.7gig processors (48 cores)
Shared buffers set to 120gig
Maintenance work men set to 1gig
work men set to 500 meg

Things I have read/seen/been told to tweak…

fsync (set to off)
setting wal_level to minimal (to avoid wal logging of cluster activity)
bumping up maintenance work men (but I’ve also seen/read that uber high values cause disk based sorts which ultimately slow things down)
Tweaking checkpoint settings (although with wal_level set to minimal - I don’t think it comes into play)

any good suggestions for lighting a fire under this process?

If worse comes to worse, I can vacuum full the table and reindex each index concurrently -   but it won’t give me the benefit of having the tuples ordered by their oft-grouped primary key.

Responses

Browse pgsql-general by date

  From Date Subject
Next Message Adrian Klaver 2014-01-09 23:09:24 Re: pg_restore - selective restore use cases. HINT use DROP CASCADE
Previous Message Day, David 2014-01-09 21:51:39 Re: pg_restore - selective restore use cases. HINT use DROP CASCADE