Re: snapper vs. HEAD

From: Andres Freund <andres(at)anarazel(dot)de>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: pgbf(at)twiska(dot)com, pgsql-hackers(at)lists(dot)postgresql(dot)org
Subject: Re: snapper vs. HEAD
Date: 2020-03-29 23:17:08
Message-ID: 20200329231708.5yop3ni3rutjmkkh@alap3.anarazel.de
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Hi,

On 2020-03-28 23:50:32 -0400, Tom Lane wrote:
> Buildfarm member snapper has been crashing in the core regression tests
> since commit 17a28b0364 (well, there's a bit of a range of uncertainty
> there, but 17a28b0364 looks to be the only such commit that could have
> affected code in gistget.c where the crash is). Curiously, its sibling
> skate is *not* failing, despite being on the same machine and compiler.

Hm. There's some difference in code-gen specific options.

snapper has:
'CFLAGS' => '-g -O2 -fstack-protector --param=ssp-buffer-size=4 -Wformat -Werror=format-security ',
'CPPFLAGS' => '-D_FORTIFY_SOURCE=2',
'LDFLAGS' => '-Wl,-z,relro -Wl,-z,now'
and specifies (among others)
'--enable-thread-safety',
'--with-gnu-ld',
whereas skate has --enable-cassert.

Not too hard to imagine that several of these could cause enough
code-gen differences so that one exhibits the bug, and the other
doesn't.

The different commandlines for gistget end up being:

snapper:
ccache gcc-4.7 -std=gnu99 -Wall -Wmissing-prototypes -Wpointer-arith -Wdeclaration-after-statement -Werror=vla -Wendif-labels -Wmissing-format-attribute -Wformat-security -fno-strict-aliasing -fwrapv -fexcess-precision=standard -g -g -O2 -fstack-protector --param=ssp-buffer-size=4 -Wformat -Werror=format-security -I../../../../src/include -D_FORTIFY_SOURCE=2 -D_GNU_SOURCE -I/usr/include/libxml2 -I/usr/include/mit-krb5 -c -o gistget.o gistget.c
skate:
ccache gcc-4.7 -std=gnu99 -Wall -Wmissing-prototypes -Wpointer-arith -Wdeclaration-after-statement -Werror=vla -Wendif-labels -Wmissing-format-attribute -Wformat-security -fno-strict-aliasing -fwrapv -fexcess-precision=standard -g -O2 -I../../../../src/include -D_GNU_SOURCE -I/usr/include/libxml2 -c -o gistget.o gistget.c

> I looked into this by dint of setting up a similar environment in a
> qemu VM. I might not have reproduced things exactly, but I managed
> to get the same kind of crash at approximately the same place, and
> what it looks like to me is a compiler bug.

What options were you using? Reproducing snapper as exactly as possible?

> It's unclear how 17a28b0364 would have affected this, but there is
> an elog() call elsewhere in the same function, so maybe the new
> coding for that changed register assignments or some other
> phase-of-the-moon effect.

Yea, wouldn't be too surprising.

> I doubt that anyone's going to take much interest in fixing this
> old compiler version, so my recommendation is to back off the
> optimization level on snapper to -O1, and probably on skate as
> well because there's no obvious reason why the same compiler bug
> might not bite skate at some point. I was able to get through
> the core regression tests on my qemu VM after recompiling
> gistget.c at -O1 (with other flags the same as snapper is using).

If you still have the environment it might make sense to check wether
it's related to one of the other options. But otherwise I wouldn't be
against the proposal.

Greetings,

Andres Freund

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Andres Freund 2020-03-29 23:22:03 Re: DROP DATABASE doesn't force other backends to close FDs
Previous Message Tomas Vondra 2020-03-29 22:52:43 Re: Improving connection scalability: GetSnapshotData()