Re: Segfault 11 on PG10 with max_parallel_workers_per_gather>3

From: Stefan Tzeggai <tzeggai(at)empirica-systeme(dot)de>
To: pgsql-bugs(at)postgresql(dot)org
Subject: Re: Segfault 11 on PG10 with max_parallel_workers_per_gather>3
Date: 2017-10-25 20:33:13
Message-ID: 8922510d-699f-3f4d-4766-ff915e91c645@empirica-systeme.de
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-bugs

Hi

To be precise I can only reproduce the bug about 20% of the times I
execute the query. I have to run the query four, five times, it it
crashes. Reproduced that many times.
I have a feeling that it hast todo with the number of parallel workers
that the planner starts. I found no way to force it to any number.

I have seen this segfault on at least two mashines (running the same
application with same data). Have not seen it since I lowered
max_parallel_workers_per_gather to 2.

I tried to generate a table+matview+indexes etc. to reproduce the crash
from scratch, but i had no success so far.

I also tried to get a sensible stack trace. I attached 9 gdb to all
postgres-pids and when I triggered the crash, two of the gdb had some
output and produced something on 'bt'. Attached..

If I would be able to dump the relevant data from my db and I would be
able to reproduce the crash with it on a fresh PG10 install - Would
anyone have time to look at it? I guess its would no more than 50Mb...

I am happy to help as good as i can,

Steve

Program received signal SIGUSR1, User defined signal 1.
0x00007f12334039b3 in __epoll_wait_nocancel () at
../sysdeps/unix/syscall-template.S:84
84 in ../sysdeps/unix/syscall-template.S
Continuing.

Program received signal SIGUSR1, User defined signal 1.
0x00007f12334039b3 in __epoll_wait_nocancel () at
../sysdeps/unix/syscall-template.S:84
84 in ../sysdeps/unix/syscall-template.S
#0 0x00007f12334039b3 in __epoll_wait_nocancel () at
../sysdeps/unix/syscall-template.S:84
#1 0x00005564bcaccd01 in WaitEventSetWaitBlock (nevents=1,
occurred_events=0x7ffce2d47e90, cur_timeout=200, set=0x5564beab53a8) at
/build/postgresql-10-YNofMT/postgresql-10-10.0/build/../src/backend/storage/ipc/latch.c:1048
#2 WaitEventSetWait (set=set(at)entry=0x5564beab53a8,
timeout=timeout(at)entry=200,
occurred_events=occurred_events(at)entry=0x7ffce2d47e90,
nevents=nevents(at)entry=1, wait_event_info=wait_event_info(at)entry=83886093)
at
/build/postgresql-10-YNofMT/postgresql-10-10.0/build/../src/backend/storage/ipc/latch.c:1000
#3 0x00005564bcacd174 in WaitLatchOrSocket (latch=0x7f1227241be4,
wakeEvents=wakeEvents(at)entry=25, sock=sock(at)entry=-1, timeout=200,
wait_event_info=wait_event_info(at)entry=83886093) at
/build/postgresql-10-YNofMT/postgresql-10-10.0/build/../src/backend/storage/ipc/latch.c:385
#4 0x00005564bcacd225 in WaitLatch (latch=<optimized out>,
wakeEvents=wakeEvents(at)entry=25, timeout=<optimized out>,
wait_event_info=wait_event_info(at)entry=83886093) at
/build/postgresql-10-YNofMT/postgresql-10-10.0/build/../src/backend/storage/ipc/latch.c:339
#5 0x00005564bca8193f in WalWriterMain () at
/build/postgresql-10-YNofMT/postgresql-10-10.0/build/../src/backend/postmaster/walwriter.c:293
#6 0x00005564bc8c0401 in AuxiliaryProcessMain (argc=argc(at)entry=2,
argv=argv(at)entry=0x7ffce2d48070) at
/build/postgresql-10-YNofMT/postgresql-10-10.0/build/../src/backend/bootstrap/bootstrap.c:442
#7 0x00005564bca7cd83 in StartChildProcess (type=WalWriterProcess) at
/build/postgresql-10-YNofMT/postgresql-10-10.0/build/../src/backend/postmaster/postmaster.c:5313
#8 0x00005564bca7e11a in reaper (postgres_signal_arg=<optimized out>)
at
/build/postgresql-10-YNofMT/postgresql-10-10.0/build/../src/backend/postmaster/postmaster.c:2871
#9 <signal handler called>
#10 0x00007f12333f9573 in __select_nocancel () at
../sysdeps/unix/syscall-template.S:84
#11 0x00005564bc82a489 in ServerLoop () at
/build/postgresql-10-YNofMT/postgresql-10-10.0/build/../src/backend/postmaster/postmaster.c:1717
#12 0x00005564bca7fa6b in PostmasterMain (argc=5, argv=<optimized out>)
at
/build/postgresql-10-YNofMT/postgresql-10-10.0/build/../src/backend/postmaster/postmaster.c:1361
#13 0x00005564bc82c2d5 in main (argc=5, argv=0x5564bea7a850) at
/build/postgresql-10-YNofMT/postgresql-10-10.0/build/../src/backend/main/main.c:228

########## second one:

Continuing.

Program received signal SIGUSR1, User defined signal 1.
0x00007f12333f9573 in __select_nocancel () at
../sysdeps/unix/syscall-template.S:84
84 in ../sysdeps/unix/syscall-template.S
#0 0x00007f12333f9573 in __select_nocancel () at
../sysdeps/unix/syscall-template.S:84
#1 0x00005564bc82a489 in ServerLoop () at
/build/postgresql-10-YNofMT/postgresql-10-10.0/build/../src/backend/postmaster/postmaster.c:1717
#2 0x00005564bca7fa6b in PostmasterMain (argc=5, argv=<optimized out>)
at
/build/postgresql-10-YNofMT/postgresql-10-10.0/build/../src/backend/postmaster/postmaster.c:1361
#3 0x00005564bc82c2d5 in main (argc=5, argv=0x5564bea7a850) at
/build/postgresql-10-YNofMT/postgresql-10-10.0/build/../src/backend/main/main.c:228

Attachment Content-Type Size
debuglog37406.txt text/plain 2.7 KB
debuglog26479.txt text/plain 936 bytes

In response to

Responses

Browse pgsql-bugs by date

  From Date Subject
Next Message Tom Lane 2017-10-25 20:41:57 Re: Segfault 11 on PG10 with max_parallel_workers_per_gather>3
Previous Message dcwatson 2017-10-25 20:04:57 BUG #14872: libpq requires a home directory