Re: Upgrade 10.5->10.6 : db crash BUS ERROR (sig 10), reproducible

From: Peter <pmc(at)citylink(dot)dinoex(dot)sub(dot)org>
To: Andrew Gierth <andrew(at)tao11(dot)riddles(dot)org(dot)uk>
Cc: pgsql-admin(at)postgresql(dot)org, pgsql(at)FreeBSD(dot)org
Subject: Re: Upgrade 10.5->10.6 : db crash BUS ERROR (sig 10), reproducible
Date: 2019-03-08 21:14:37
Message-ID: 20190308211437.GA86762@gate.oper.dinoex.org
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-admin

Hi Andrew,

many thanks for Your efforts!

Lets see what I get out of this. First, it seems I can reproduce the
fault on my build machine (IvyBridge core-i5) in the i386-chroot as
well - which is not a surprize according to Your explanations.

On Fri, Mar 08, 2019 at 04:51:47PM +0000, Andrew Gierth wrote:

! MOVAPS is an SSE (not SSE2) instruction; it's enabled by virtue of the
! fact that you used -march=pentium3 (the pentium3 supports SSE but not
! SSE2). The "A" stands for "aligned"; an unaligned source address causes
! an exception. %esp+0x20 is not correctly aligned for the instruction.

Okay so far. I was occasionally wondering if that pentium3 option
would effect anything at all. Now we see, it does. ;)

! GCC defaults to using a 16-byte stack alignment, but it relies on the
! caller to align the stack too, so if a GCC-compiled function is called
! from code that doesn't align the stack, then this kind of error can
! result. I do not know offhand (but I plan to find out) what clang's
! default stack alignment on i386 is.

Well, what caused me a headache this evening is: who would be the
caller in this case, as -from my understanding- it is just postgreSQL
running?
Now from Your newer mail this riddle does clear up well.

In my build environment, I can now create and start a new db-cluster
and issue only the single command "CREATE ROLE bacula;" and it will
crash - but then again I have to wait for the next checkpointer.

! You can tell GCC to realign the stack itself using the -mstackrealign
! option.

Yepp, that appears to solve it.

So, as there is a fix now, I'm pondering about who would be the
responsible to apply it?
* the system owner (alongside with the CPU definition)
* the port maintainer (alongside with the compiler choice)
* the postgres configure script

! This problem shows up only with GCC and not with clang because clang
! does not attempt to use SSE to vectorize this particular piece of code.
! The non-vectorized implementation generated by clang has no special
! requirements for stack alignment. But at the end of the day this is not
! a problem with PostgreSQL - it would show up with any code compiled with
! GCC where the compiler had elected to use SSE instructions for
! optimization.

Well, its clearly my fault, coming up with that pentium3 option. *gg*

rgds, P.

In response to

Browse pgsql-admin by date

  From Date Subject
Next Message Mark Steben 2019-03-08 22:22:16 sysout error log being truncated
Previous Message Andrew Gierth 2019-03-08 20:22:33 Re: Upgrade 10.5->10.6 : db crash BUS ERROR (sig 10), reproducible