Re: 9.2 pg_upgrade regression tests on WIndows

From: Bruce Momjian <bruce(at)momjian(dot)us>
To: Andrew Dunstan <andrew(at)dunslane(dot)net>
Cc: PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: 9.2 pg_upgrade regression tests on WIndows
Date: 2012-09-06 02:07:37
Message-ID: 20120906020737.GF29484@momjian.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Wed, Sep 5, 2012 at 10:04:07PM -0400, Andrew Dunstan wrote:
>
> On 09/05/2012 09:42 PM, Bruce Momjian wrote:
> >On Wed, Sep 5, 2012 at 09:07:05PM -0400, Andrew Dunstan wrote:
> >>>OK, I worked with Andrew on this issue, and have applied the attached
> >>>patch which explains what is happening in this case. Andrew's #ifndef
> >>>WIN32 was the correct fix. I consider this issue closed.
> >>>
> >>
> >>It looks like we still have problems in this area :-( see <http://www.pgbuildfarm.org/cgi-bin/show_log.pl?nm=pitta&dt=2012-09-05%2023%3A05%3A16>
> >>
> >>Now it looks like somehow the fopen on the log file that isn't
> >>commented out is failing. But the identical code worked on the same
> >>machine on HEAD. SO this does rather look like a timing issue.
> >>
> >>Investigating ...
> >Yes, that is very odd. It is also right after the code we just changed
> >to use binary mode to split the pg_dumpall file, split_old_dump().
> >
> >The code is doing pg_ctl -w stop, then starting a new postmaster with
> >pg_ctl -w start. Looking at the pg_ctl.c code (that you wrote), what
> >pg_ctl -w stop does is to wait for the postmaster.pid file to disappear,
> >then it returns complete. I suppose it is possible that the pid file is
> >getting removed, pg_ctl is returning done, but the pg_ctl binary is
> >still running, holding open those log files.
> >
> >I guess the buildfarm is showing us the problems in pg_upgrade, as it
> >should. I think you might be right that we need to add a sleep(1) at
> >the end of stop_postmaster on Windows, and document it is to give the
> >postmaster time to release its log files.
>
>
>
> Icky. I wish there were some nice portable flock() mechanism we could use.
>
> I just re-ran the test on the same machine, same code, same
> everything as the reporte3d failure, and it passed, so it definitely
> looks like it's a timing issue.
>
> I'd be inclined to put a loop around that fopen() to try it once
> every second for, say, 5 seconds.

Yes, good idea.

--
Bruce Momjian <bruce(at)momjian(dot)us> http://momjian.us
EnterpriseDB http://enterprisedb.com

+ It's impossible for everything to be true. +

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Bruce Momjian 2012-09-06 02:08:56 Re: Draft release notes complete
Previous Message Andrew Dunstan 2012-09-06 02:04:07 Re: 9.2 pg_upgrade regression tests on WIndows