From: | Peter <pmc(at)citylink(dot)dinoex(dot)sub(dot)org> |
---|---|
To: | pgsql-admin(at)postgresql(dot)org |
Cc: | pgsql(at)FreeBSD(dot)org |
Subject: | Re: Upgrade 10.5->10.6 : db crash BUS ERROR (sig 10), reproducible |
Date: | 2019-03-08 01:20:12 |
Message-ID: | 20190308012012.GA49481@gate.oper.dinoex.org |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-admin |
Hi Tom, Andrew,
much thanks for the replies! Alright, lets fill in some concrete
data:
> I'm assuming from the CC that this is on FreeBSD, but on what
> architecture?
When on my evening errands I recognized that I should have mentioned
this - FreeBSD is correct; it is built on amd64 for i386, and run on
i386.
Version:
FreeBSD 11.2-RELEASE-p9 #0 r343946M#C51:82
Build-Options:
OPTIONS_FILE_UNSET+=DEBUG
OPTIONS_FILE_UNSET+=DOCS
OPTIONS_FILE_UNSET+=DTRACE
OPTIONS_FILE_SET+=GSSAPI
OPTIONS_FILE_SET+=INTDATE
OPTIONS_FILE_UNSET+=LDAP
OPTIONS_FILE_SET+=NLS
OPTIONS_FILE_UNSET+=OPTIMIZED_CFLAGS
OPTIONS_FILE_UNSET+=PAM
OPTIONS_FILE_SET+=SSL
OPTIONS_FILE_SET+=TZDATA
OPTIONS_FILE_SET+=XML
Extra Compiler-Options:
-march=pentium3
Init-Options:
--data-checksums --encoding=utf-8 --lc-collate=de_DE.UTF-8
--lc-ctype=de_DE.UTF-8 --lc-messages=en_US.UTF-8
--lc-monetary=en_US.UTF-8 --lc-numeric=en_US.UTF-8
--lc-time=en_US.UTF-8
Run-Options:
-w -m fast -o --config_file=/usr/local/etc/postgresql/postgresql.conf
Furthermore, FreeBSD did impose a change for R. 10.6: it forces the
use of gcc on i386 (gcc-8 in this case). Earlier versions were built
with system compiler Clang. The commitlog says this about the matter:
! r484807 | girgen | 2018-11-12 16:54:19 +0100 (Mon, 12 Nov 2018) | 5 lines
!
! Fix build problems on i386
!
! Use GCC seems to be proper way to do it. SSE2 would not be available
! for all CPU:s.
> Did it drop a core file (look in the data dir for postgres.core) and if
> so can you get a backtrace?
Looking... yes, there is a core. Lets grab a first-fault core,
as that one obviousely is from the failed recover:
! (gdb) core postgres.core.1st
! Core was generated by `postgres: bgworker: parallel worker for PID 68755 '.
! Program terminated with signal 10, Bus error.
! Reading symbols from <etc etc>
! #0 0x0838bdf2 in pg_checksum_page ()
! (gdb) bt
! #0 0x0838bdf2 in pg_checksum_page ()
! #1 0x0838a2b8 in PageIsVerified ()
! #2 0x5a824500 in ?? ()
! #3 0x00000000 in ?? ()
The second one looks this way:
! (gdb) core postgres.core
! Core was generated by `postgres: startup process recovering 000000010000002C000000C6'.
! Program terminated with signal 10, Bus error.
! Reading symbols from <lots of files>
! #0 0x0838bdf2 in pg_checksum_page ()
! (gdb) bt
! #0 0x0838bdf2 in pg_checksum_page ()
! #1 0x0838a2b8 in PageIsVerified ()
! #2 0x59e14500 in ?? ()
! #3 0x00000000 in ?? ()
Anything more I can do here? (Advice on how to build with debugging
support is appreciated.)
> You can check whether your CPU supports SSE2 by looking at the Features=
> line in /var/run/dmesg.boot. It seems unlikely that it does not, because
> SSE2 was introduced in 2000 with the Pentium 4.
No need to check; I am absolutely certain that it does NOT.
https://www.asus.com/supportonly/CUV4X-DLS/HelpDesk_CPU/
But, Your explanation seems not to answer the fundamental question: if
the database at 10.6 is still supposed to be able to run without SSE2?
> It seems pretty unlikely that that'd have anything to do with a
> bus-error failure, anyway. But this report contains far too little
> information to let anyone do anything but speculate.
Whateever information You like to have, just ask and I will gladly do
my best to obtain it, as I get around. (This is a reproducible on a
very well maintained piece of software - this is rather fun.)
Some more experiments & observations:
The crash happens at a specific query - I get parse,bind, but no execute
timing.
Furthermore, when I try and set
! max_parallel_workers_per_gather = 0
then the query goes thru and delivers proper results. But then after
few minutes I get this one:
! postgres[71256]: [8-1] :[] LOG: 00000: checkpointer process (PID 71258)
! was terminated by signal 10: Bus error
Different approach, same result:
! dynamic_shared_memory_type = posix -> crash immediate
! dynamic_shared_memory_type = sysv -> crash immediate
! dynamic_shared_memory_type = mmap -> crash immediate
! dynamic_shared_memory_type = none -> crash later in checkpointer
regards,
PMc
From | Date | Subject | |
---|---|---|---|
Next Message | Andrew Gierth | 2019-03-08 02:35:33 | Re: Upgrade 10.5->10.6 : db crash BUS ERROR (sig 10), reproducible |
Previous Message | Ron | 2019-03-08 00:02:55 | Re: Upgrade 10.5->10.6 : db crash BUS ERROR (sig 10), reproducible |