From: | Greg Williamson <gwilliamson39(at)yahoo(dot)com> |
---|---|
To: | |
Cc: | "pgsql-admin(at)postgresql(dot)org" <pgsql-admin(at)postgresql(dot)org> |
Subject: | Re: Database size stays constant but disk space keeps shrinking -- postgres 9.1 |
Date: | 2012-10-02 22:02:23 |
Message-ID: | 1349215343.78143.YahooMailNeo@web125901.mail.ne1.yahoo.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-admin |
I've done some more testing and the problem seems to be repmgr itself.
A few details below...
----- Original Message -----
> From: Greg Williamson <gwilliamson39(at)yahoo(dot)com>
> To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
> Cc: "pgsql-admin(at)postgresql(dot)org" <pgsql-admin(at)postgresql(dot)org>
> Sent: Thursday, September 27, 2012 7:23 PM
> Subject: Re: [ADMIN] Database size stays constant but disk space keeps shrinking -- postgres 9.1
>
>T om --
>
> ----- Original Message -----
>> From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
>> To: Greg Williamson <gwilliamson39(at)yahoo(dot)com>
>> Cc: "pgsql-admin(at)postgresql(dot)org"
> <pgsql-admin(at)postgresql(dot)org>
>> Sent: Thursday, September 27, 2012 7:14 PM
>> Subject: Re: [ADMIN] Database size stays constant but disk space keeps
> shrinking -- postgres 9.1
>>
>> G reg Williamson <gwilliamson39(at)yahoo(dot)com> writes:
>>>> Have you checked to see if there are any processes that have open
>> handles to
>>>> deleted files (lsof -X | grep deleted).
>>
>>> lsof -X | grep deleted | wc -l
>>
>>> shows: 835 such files.
>>
>>> A couple:
>>> postgres 2540 postgres 50u REG 8,3 409600
>
>> 93429 /var/lib/postgresql/9.1/main/base/2789
>>> 200/11816 (deleted)
>>> postgres 2540 postgres 51u REG 8,3 18112512
>
>> 49694570 /var/lib/postgresql/9.1/main/base/2789
>>> 200/2791679 (deleted)
>>> <...>
>>
>> So, which processes are holding these open, and what are they doing
>> exactly? Let's see output from ps and pg_stat_activity, maybe even
>> attach to them with gdb and get stack traces.
>>
>>> We've a planned restart scheduled soon which will let me find any
>>> scripts that might be keeping things open,
>>
>> A restart will destroy all the evidence, so let's not be in a hurry
>> to do that before we've identified what's happening.
>>
>> regards, tom lane
>>
>
> Thanks for the suggestions -- I'll post back when I have more info. Many of
> these do not seem to have a link to any identifiable process that is still
> running, but some do and they have pointed me away from the hourly drop /
> rebuild, at least for now. Looks like the stats database may be the issue.
>
> Greg W.
I turned off the cronjob that did the hourly database create / drop and am still leaking disk space, but a but slower -- only lost 2 gigs overnight.
While running this process I see these data directories:
postgres(at)db11:~$ ls -lrt 9.1/main/base
total 200
drwx------ 2 postgres postgres 6 2012-09-21 16:36 pgsql_tmp
drwx------ 2 postgres postgres 8192 2012-10-01 00:26 16387
drwx------ 2 postgres postgres 16384 2012-10-01 00:26 1418400
drwx------ 2 postgres postgres 8192 2012-10-01 00:26 2047839
drwx------ 2 postgres postgres 8192 2012-10-01 00:26 11946
drwx------ 2 postgres postgres 8192 2012-10-01 00:27 16449
drwx------ 2 postgres postgres 8192 2012-10-01 00:27 16392
drwx------ 2 postgres postgres 8192 2012-10-01 00:27 16402
drwx------ 2 postgres postgres 8192 2012-10-01 00:27 11938
drwx------ 2 postgres postgres 8192 2012-10-01 00:27 1
drwx------ 2 postgres postgres 8192 2012-10-01 08:17 16424
drwx------ 2 postgres postgres 32768 2012-10-01 19:20 3171846
When it is done (note the last directory is now gone):
postgres(at)db11:~$ ls -lrt 9.1/main/base
total 140
drwx------ 2 postgres postgres 6 2012-09-21 16:36 pgsql_tmp
drwx------ 2 postgres postgres 8192 2012-10-01 00:26 16387
drwx------ 2 postgres postgres 16384 2012-10-01 00:26 1418400
drwx------ 2 postgres postgres 8192 2012-10-01 00:26 2047839
drwx------ 2 postgres postgres 8192 2012-10-01 00:26 11946
drwx------ 2 postgres postgres 8192 2012-10-01 00:27 16449
drwx------ 2 postgres postgres 8192 2012-10-01 00:27 16392
drwx------ 2 postgres postgres 8192 2012-10-01 00:27 16402
drwx------ 2 postgres postgres 8192 2012-10-01 00:27 11938
drwx------ 2 postgres postgres 8192 2012-10-01 00:27 1
drwx------ 2 postgres postgres 8192 2012-10-01 08:17 16424
When I run lsof -X and grep for deleted files I see these 4 new entries added since the last database create/drop:
ase/3167420/3169915 (deleted)
postgres 21116 postgres 66u REG 8,3 19709952 136501576 /var/lib/postgresql/9.1/main/base/3171846/3174279 (deleted)
postgres 21116 postgres 67u REG 8,3 15450112 136501574 /var/lib/postgresql/9.1/main/base/3171846/3174278 (deleted)
postgres 21116 postgres 68u REG 8,3 28344320 136410873 /var/lib/postgresql/9.1/main/base/3171846/3172541 (deleted)
postgres 21116 postgres 69u REG 8,3 82452480 144333458 /var/lib/postgresql/9.1/main/base/3171846/3174341 (deleted)
root(at)db11:~#
root(at)db11:~# ps auxww | grep 21116
postgres 21116 0.0 0.1 100416 32332 ? Ss 00:26 0:16 postgres: repmgr repmgr 199.9.xxx.yyy(45239) idle
root 25755 0.0 0.0 6440 840 pts/2 S+ 19:38 0:00 grep --color=auto 21116
======
With the database create/drop suspended we still see a steady accumulation of dead file descriptors, but at a slower rate.
< /dev/sda3 67G 28G 39G 42% /
---
> /dev/sda3 67G 29G 38G 44% /
Other than abandoning repmgr I don't see a solution. I've posted this to the repmgr discussion group but have had zero responses (and, frankly, am not holding my breath).
If anyone has any suggestions I'm all ears.
Thanks for the bandwidth!
Greg W.
From | Date | Subject | |
---|---|---|---|
Next Message | Brian Fehrle | 2012-10-03 00:13:41 | Recovery from PITR corrupted |
Previous Message | Babay Adi, Hava | 2012-10-02 18:54:09 | Creating schema best practices |