Reindex taking forever, and 99% CPU

From: Phoenix Kiula <phoenix(dot)kiula(at)gmail(dot)com>
To: PG-General Mailing List <pgsql-general(at)postgresql(dot)org>
Subject: Reindex taking forever, and 99% CPU
Date: 2014-08-03 01:20:44
Message-ID: CAFWfU=vhy2dLmMFX-c6d21C3w4VabL2F97BnLcnMedO8HDAO7Q@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-general

Hi. I've been patient. PG is 9.0.17, updated via Yum yesterday.

One of my large tables (101 GB on disk, about 1.1 billion rows) used
to take too long to vacuum. Not sure if it's an index corruption
issue. But I tried VACUUM FULL ANALYZE as recommended in another
thread yesterday, which took 5 hours on the two times I tried, without
finishing.

Now the REINDEX TABLE has taken over 6 hours as I decided to be
patient and just let something finish. Not sure this is normal though!
How do production level DBAs do this if it takes so long?

If I open another SSH window to my server and try "select * from
pg_stats_activity" it just hangs there, as the REINDEX I presume is
taking up all the memory? I basically can't do anything else on this
server.

Just in case it helps, a segment of my postgresql.conf is below. Would
appreciate any tips on what I can do.

(I did a pg_dump of just this table, which also took about 2 hours,
then I renamed the original table in the database, and tried to
pg_restore just the table, but it gave me an error message about the
archive being in the wrong format !!! So REINDEX or something like it
seems to be the only idea?)

Thanks for any help!

PS: CentOS 6 64 bit, 4 GB of RAM, Raid 1 Raptor disks. Postgresql.conf
and TOP output during the running of the REINDEX are below..

----POSTGRESQL.CONF-----

max_connections = 180
superuser_reserved_connections = 5
shared_buffers = 512MB
effective_cache_size = 1200MB
temp_buffers = 32MB
maintenance_work_mem = 320MB
work_mem = 128MB
wal_buffers = 20MB
fsync = on
checkpoint_segments = 128
checkpoint_timeout = 1000
enable_indexscan = on

# AUTOVAC
autovacuum = on
autovacuum_max_workers = 5 # max number of autovacuum subprocesses
#autovacuum_vacuum_scale_factor = 0.2 # fraction of table size
before vacuum
autovacuum_vacuum_cost_delay = 20ms
autovacuum_vacuum_cost_limit = 350
...

------TOP OUTPUT (db name changed for privacy, with the word "MYDOMAIN") -------

top - 21:18:51 up 22 days, 7:43, 2 users, load average: 1.20, 1.17, 1.18
Tasks: 214 total, 3 running, 211 sleeping, 0 stopped, 0 zombie
Cpu(s): 25.1%us, 1.6%sy, 0.0%ni, 71.9%id, 1.1%wa, 0.0%hi, 0.3%si, 0.0%st
Mem: 4046644k total, 4022324k used, 24320k free, 9880k buffers
Swap: 2096440k total, 177144k used, 1919296k free, 2526536k cached

PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
21044 postgres 25 0 1102m 513m 76m R 97.7 13.0 432:03.46 postgres:
MYDOMAIN_MYDOMAIN MYDOMAIN [local] REINDEX
8812 root 18 0 1403m 53m 3344 S 2.3 1.4 377:33.38
./jre/bin/java -Djava.compiler=NONE -cp /usr/StorMan/RaidMan.jar com.
8319 named 24 0 317m 37m 1860 S 1.3 0.9 319:11.26
/usr/sbin/named -u named -4 -t /var/named/chroot
14184 nobody 15 0 266m 15m 5156 S 1.0 0.4 4:13.43 nginx:
worker process
14181 nobody 15 0 279m 34m 5160 S 0.7 0.9 4:13.93 nginx:
worker process
30285 root 15 0 12760 1188 820 R 0.7 0.0 0:00.03 top
282 root 10 -5 0 0 0 S 0.3 0.0 184:37.48 [kswapd0]
25093 nobody 16 0 334m 15m 5124 S 0.3 0.4 0:01.00
/usr/local/apache/bin/httpd -k restart -DSSL
25095 nobody 15 0 334m 15m 5256 S 0.3 0.4 0:00.94
/usr/local/apache/bin/httpd -k restart -DSSL
25102 nobody 15 0 334m 15m 5120 S 0.3 0.4 0:00.93
/usr/local/apache/bin/httpd -k restart -DSSL
25106 nobody 15 0 334m 15m 5416 S 0.3 0.4 0:00.99
/usr/local/apache/bin/httpd -k restart -DSSL
25109 nobody 15 0 334m 15m 5424 S 0.3 0.4 0:00.94
/usr/local/apache/bin/httpd -k restart -DSSL
25113 nobody 16 0 334m 15m 4980 S 0.3 0.4 0:00.93
/usr/local/apache/bin/httpd -k restart -DSSL
25115 nobody 16 0 334m 15m 5192 S 0.3 0.4 0:00.95
/usr/local/apache/bin/httpd -k restart -DSSL
25117 nobody 16 0 334m 15m 4988 S 0.3 0.4 0:00.97
/usr/local/apache/bin/httpd -k restart -DSSL
25119 nobody 16 0 334m 15m 5028 S 0.3 0.4 0:00.96
/usr/local/apache/bin/httpd -k restart -DSSL
31759 root 15 0 0 0 0 S 0.3 0.0 0:35.37 [pdflush]
1 root 15 0 10368 592 556 S 0.0 0.0 0:04.29 init [3]
2 root RT -5 0 0 0 S 0.0 0.0 0:06.24
[migration/0]
3 root 34 19 0 0 0 S 0.0 0.0 0:08.72
[ksoftirqd/0]
4 root RT -5 0 0 0 S 0.0 0.0 0:00.00
[watchdog/0]
5 root RT -5 0 0 0 S 0.0 0.0 0:05.27
[migration/1]
6 root 34 19 0 0 0 S 0.0 0.0 3:49.89
[ksoftirqd/1]
7 root RT -5 0 0 0 S 0.0 0.0 0:00.00 [watchdog/1]

Responses

Browse pgsql-general by date

  From Date Subject
Next Message Adrian Klaver 2014-08-03 01:46:51 Re: Reindex taking forever, and 99% CPU
Previous Message Phoenix Kiula 2014-08-02 13:50:07 Re: Upgrading from 9.0.11 to 9.3.5 on CentOS 6 (64 bit)