From: | Thomas Munro <thomas(dot)munro(at)enterprisedb(dot)com> |
---|---|
To: | Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> |
Cc: | Pg Hackers <pgsql-hackers(at)postgresql(dot)org> |
Subject: | Re: "pg_ctl: the PID file ... is empty" at end of make check |
Date: | 2018-11-28 05:31:10 |
Message-ID: | CAEepm=1dONOF+hBijV45dw3nsKe+OazmFHK3Lr44AbRsVPZTyA@mail.gmail.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
On Wed, Nov 28, 2018 at 5:28 PM Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
> Thomas Munro <thomas(dot)munro(at)enterprisedb(dot)com> writes:
> > Today I saw a one-off case of $SUBJECT, on macOS. I can't reproduce
> > it, but I noticed exactly the same thing on longfin the other day:
> > https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=longfin&dt=2018-11-25%2005%3A39%3A04
>
> I trawled the buildfarm logs and discovered a second instance of exactly
> the same thing:
>
> https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=longfin&dt=2018-11-19%2018%3A37%3A00
>
> There have not been any other occurrences in the past 3 months, which is
> as far back as I went. (lorikeet has half a dozen occurrences of "could
> not stop postmaster", which is what I was grepping for, but they all
> are associated with that machine's intermittent postmaster crashes.)
>
> So that lets out the flaky-hardware theory: that occurrence is before
> longfin's hardware transplant.
>
> Also, I don't think I believe the OS-bug idea either, given that you
> saw it on 10.14.0. longfin's been running 10.14.something since
> 2018-09-26, and has accumulated circa 200 runs since then just on HEAD,
> never mind the back branches. It'd be pretty unlikely to see it only
> in the past week, and only on HEAD, if it were an OS bug introduced two
> months ago.
Yeah, it'd be slightly easier to believe when High Sierra first came
out and every hfs+ volume was silently migrated to the brand new apfs.
But yeah, that idea seems like a long shot at this point.
> So my theory is we broke something in HEAD a couple weeks ago. But what?
Hmm. Not seeing it. I'm trying to do it again, with a make check loop.
> The fsync changes you made are suspiciously close to this issue (ie one
> could explain it as written data not getting out), and were committed in
> the right time frame, but that change didn't affect writes to
> postmaster.pid did it?
Commit 9ccdd7f6 doesn't affect writes to anything. It just changes
the elevel if certain fsync calls fail (and incidentally none near
this code, and in any case there was no failure).
--
Thomas Munro
http://www.enterprisedb.com
From | Date | Subject | |
---|---|---|---|
Next Message | Tatsuro Yamada | 2018-11-28 05:41:40 | Re: Tab completion for ALTER INDEX|TABLE ALTER COLUMN SET STATISTICS |
Previous Message | Ideriha, Takeshi | 2018-11-28 05:13:26 | RE: Copy data to DSA area |