| From: | Mark Dilger <markdilger(at)yahoo(dot)com> | 
|---|---|
| To: | Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> | 
| Cc: | deepak <deepak(dot)pn(at)gmail(dot)com>, Alban Hertroys <haramrae(at)gmail(dot)com>, "pgsql-general(at)postgresql(dot)org" <pgsql-general(at)postgresql(dot)org> | 
| Subject: | Re: FATAL: lock file "postmaster.pid" already exists | 
| Date: | 2012-05-23 17:08:45 | 
| Message-ID: | 1337792925.21167.YahooMailNeo@web39304.mail.mud.yahoo.com | 
| Views: | Whole Thread | Raw Message | Download mbox | Resend email | 
| Thread: | |
| Lists: | pgsql-general | 
Prior to posting to the mailing list, we made some
changes in postmaster.c to identify where time was
being spent.  Based on the elog(NOTICE,...) lines
we put in the file, we determined the time was spent
inside RemovePgTempFiles.
I then altered RemovePgTempFiles to take a starttime
parameter and, while recursing, to check if more than
5 seconds has passed since it started.  I did not want
to add the complexity of setting an alarm and catching
the signal, so I just made the code check the wallclock
time at each step of the recursion.  When more than
5 seconds has passed, it does not recurse further.
After making this change, we have not been able to
reproduce the slowness.
We do not consider this a fix to the problem.  It is just
a tool for verifying where the slowness comes from.
________________________________
 From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Mark Dilger <markdilger(at)yahoo(dot)com> 
Cc: deepak <deepak(dot)pn(at)gmail(dot)com>; Alban Hertroys <haramrae(at)gmail(dot)com>; "pgsql-general(at)postgresql(dot)org" <pgsql-general(at)postgresql(dot)org> 
Sent: Wednesday, May 23, 2012 9:50 AM
Subject: Re: [GENERAL] FATAL: lock file "postmaster.pid" already exists 
 
Mark Dilger <markdilger(at)yahoo(dot)com> writes:
> I tried moving the call to RemovePgTempFiles until
> after the PID file is fully written, but it did not help.
I wonder whether you correctly identified the source of the slowness.
The thing I would have suspected is identify_system_timezone(), which
will attempt to read every file in the timezone-database directory tree,
of which there are about 600.  It's not unusual for that to take several
seconds on a cold-started machine that doesn't have any of that tree in
filesystem cache.  It's still a stretch to believe that it'd take
several minutes on any storage system more advanced than a floppy disk;
but at least we'd only be trying to pin about one order of magnitude
slowdown on the filesystem, rather than several orders.
If that is what is causing it, there is a very simple workaround, which
is to set the timezone setting explicitly in postgresql.conf instead of
leaving the postmaster to try to figure it out from the environment.
(9.2 will use a better answer, which is for initdb to do this once and
store the result in postgresql.conf.)
regards, tom lane
| From | Date | Subject | |
|---|---|---|---|
| Next Message | Lonni J Friedman | 2012-05-23 17:09:49 | Re: Re: significant performance hit whenever autovacuum runs after upgrading from 9.0 -> 9.1 | 
| Previous Message | Tom Lane | 2012-05-23 16:50:13 | Re: FATAL: lock file "postmaster.pid" already exists |