Re: Upgrade 10.5->10.6 : db crash BUS ERROR (sig 10), reproducible

From: Andrew Gierth <andrew(at)tao11(dot)riddles(dot)org(dot)uk>
To: Palle Girgensohn <girgen(at)FreeBSD(dot)org>
Cc: Peter <pmc(at)citylink(dot)dinoex(dot)sub(dot)org>, pgsql-admin(at)postgresql(dot)org, "pgsql\(at)freebsd(dot)org" <pgsql(at)FreeBSD(dot)org>
Subject: Re: Upgrade 10.5->10.6 : db crash BUS ERROR (sig 10), reproducible
Date: 2019-03-08 20:13:49
Message-ID: 875zstb2hr.fsf@news-spur.riddles.org.uk
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-admin

>>>>> "Andrew" == Andrew Gierth <andrew(at)tao11(dot)riddles(dot)org(dot)uk> writes:

Andrew> So I'm going to guess that your bug 236025 is actually an
Andrew> alignment problem, with the compiler making some assumption
Andrew> about alignment that we're violating. I'll investigate and see
Andrew> what I can find.

OK, I have completed my analysis of both reports.

The bottom line is that this is a disagreement between gcc and the
(clang-compiled) system libraries over what the stack alignment should
be; GCC wants and assumes 16 byte alignment, but clang won't provide
that. It's not any kind of bug in PostgreSQL.

For most applications there is no issue because GCC aligns the stack
itself on entry into main(), so the only time it becomes an issue is if
two conditions are met: (1) the application must call into an outside
(non-GCC-compiled) library which then calls _back_ into the application,
AND (2) the subsequent code executes instructions that rely on the stack
alignment for correctness (and not just performance).

PostgreSQL compiled by GCC on i386 without architecture options will not
rely on the alignment of the stack so condition (2) is not met. Only if
you specify an architecture such as -march=pentium3 (which enables SSE)
will any instructions be used which require strict alignment.

It may not be obvious how condition (1) is met, but notice that the
report from Peter has the crash happening in either a background worker
or the checkpointer process; this is significant because those are
spawned from postmaster while in a signal handler, and the signal
handler's stack frame has disturbed the stack alignment (and with the
system libraries compiled with clang and not gcc, no attempt is made to
adjust that).

So the implications for the postgresql port on freebsd/i386 are:

1. If you compile with GCC and no architecture options you should have
no problems on any cpu.

This presumably covers the case of the packaged binaries.

2. If you compile with GCC and any of -msse, -msse2, -march=pentium3 or
later, or any similar flag that enables use of SSE or later (I believe
that no MMX instructions require special alignment), then you will also
need -mstackrealign (or patch the source to add the equivalent attribute
to every signal handler function or other callback, which I don't really
recommend). (Maybe the port should add this option defensively?)

The crash in (freebsd) bug #236025 is explained by the fact that the
user had -msse2 set when compiling with GCC. Peter's crash is explained
by the use of -march=pentium3 when compiling with GCC.

3. If you compile with clang and -msse2 then there should be no stack
alignment issues (since clang doesn't assume the stack is aligned) but
obviously you then can't run the binary on a pre-pentium4 cpu.

--
Andrew (irc:RhodiumToad)

In response to

Browse pgsql-admin by date

  From Date Subject
Next Message Andrew Gierth 2019-03-08 20:22:33 Re: Upgrade 10.5->10.6 : db crash BUS ERROR (sig 10), reproducible
Previous Message Tom Lane 2019-03-08 19:13:25 Re: Upgrade 10.5->10.6 : db crash BUS ERROR (sig 10), reproducible