Re: Segmentation Fault

From: Robert Sanford <wobbet(at)gmail(dot)com>
To: pgsql-bugs(at)lists(dot)postgresql(dot)org
Subject: Re: Segmentation Fault
Date: 2023-09-13 22:31:12
Message-ID: CABa+nRuatSjnqsPfnoo0APzgeM1Um94FesYj6H1YKB1qJ8JoqA@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-bugs

I realized that when I was running GDB I just took a snap of when it
failed. Here I've added after continuing and requesting a backtrace.

rjsjr

Continuing.
[Detaching after fork from child process 297277]
[Detaching after fork from child process 297278]
[Detaching after fork from child process 297279]
[Detaching after fork from child process 297280]
[Detaching after fork from child process 297281]
[Detaching after fork from child process 297282]
[Detaching after fork from child process 297283]
[Detaching after fork from child process 297284]
[Detaching after fork from child process 297286]
[Detaching after fork from child process 297287]
[Detaching after fork from child process 297289]

Program received signal SIGUSR1, User defined signal 1.
0x0000ffffbbd6e844 in __GI___select (nfds=9, readfds=0xffffde28eab0,
writefds=0x0, exceptfds=0x0, timeout=0xffffde28ea18) at
../sysdeps/unix/sysv/linux/select.c:69
69 in ../sysdeps/unix/sysv/linux/select.c
(gdb) bt
#0 0x0000ffffbbd6e844 in __GI___select (nfds=9, readfds=0xffffde28eab0,
writefds=0x0, exceptfds=0x0, timeout=0xffffde28ea18) at
../sysdeps/unix/sysv/linux/select.c:69
#1 0x0000aaaab8124a1c in ?? ()
#2 0x0000aaaab81263cc in PostmasterMain ()
#3 0x0000aaaab7e753d8 in main ()
(gdb) c
Continuing.
[Detaching after fork from child process 297301]
[Detaching after fork from child process 297304]

Program received signal SIGUSR1, User defined signal 1.
0x0000ffffbbd6e844 in __GI___select (nfds=9, readfds=0xffffde28eab0,
writefds=0x0, exceptfds=0x0, timeout=0xffffde28ea18) at
../sysdeps/unix/sysv/linux/select.c:69
69 in ../sysdeps/unix/sysv/linux/select.c
(gdb) bt
#0 0x0000ffffbbd6e844 in __GI___select (nfds=9, readfds=0xffffde28eab0,
writefds=0x0, exceptfds=0x0, timeout=0xffffde28ea18) at
../sysdeps/unix/sysv/linux/select.c:69
#1 0x0000aaaab8124a1c in ?? ()
#2 0x0000aaaab81263cc in PostmasterMain ()
#3 0x0000aaaab7e753d8 in main ()
(gdb) c
Continuing.
[Detaching after fork from child process 297306]
[Detaching after fork from child process 297311]
[Detaching after fork from child process 297320]

Program received signal SIGUSR1, User defined signal 1.
0x0000ffffbbd6e844 in __GI___select (nfds=9, readfds=0xffffde28eab0,
writefds=0x0, exceptfds=0x0, timeout=0xffffde28ea18) at
../sysdeps/unix/sysv/linux/select.c:69
69 in ../sysdeps/unix/sysv/linux/select.c
(gdb) c
Continuing.
[Detaching after fork from child process 297330]

Program received signal SIGUSR1, User defined signal 1.
0x0000ffffbbd6e844 in __GI___select (nfds=9, readfds=0xffffde28eab0,
writefds=0x0, exceptfds=0x0, timeout=0xffffde28ea18) at
../sysdeps/unix/sysv/linux/select.c:69
69 in ../sysdeps/unix/sysv/linux/select.c
(gdb) c
Continuing.
[Detaching after fork from child process 297332]
[Detaching after fork from child process 297333]
[Detaching after fork from child process 297334]
[Detaching after fork from child process 297335]
[Detaching after fork from child process 297336]

Program received signal SIGUSR1, User defined signal 1.
0x0000ffffbbd6e844 in __GI___select (nfds=9, readfds=0xffffde28eab0,
writefds=0x0, exceptfds=0x0, timeout=0xffffde28ea18) at
../sysdeps/unix/sysv/linux/select.c:69
69 in ../sysdeps/unix/sysv/linux/select.c
(gdb) c
Continuing.
[Detaching after fork from child process 297337]
[Detaching after fork from child process 297338]
[Detaching after fork from child process 297339]
[Detaching after fork from child process 297344]
[Detaching after fork from child process 297347]

Program received signal SIGUSR1, User defined signal 1.
0x0000ffffbbd6e844 in __GI___select (nfds=9, readfds=0xffffde28eab0,
writefds=0x0, exceptfds=0x0, timeout=0xffffde28ea18) at
../sysdeps/unix/sysv/linux/select.c:69
69 in ../sysdeps/unix/sysv/linux/select.c
(gdb) c
Continuing.
[Detaching after fork from child process 297351]
[Detaching after fork from child process 297352]
[Detaching after fork from child process 297353]
[Detaching after fork from child process 297354]
[Detaching after fork from child process 297355]

On Wed, Sep 13, 2023 at 4:53 PM Robert Sanford <wobbet(at)gmail(dot)com> wrote:

> PostgreSQL v15.4
> Running on Ubuntu Jammy
> ARM64 x8
> 64 GB RAM
>
> I have been trying to tune the performance of a query, and wasn't having a
> lot of luck.I decided to run VACUUM ANALYZE to see if that would help.
>
> The Postgres server crashed.
>
> I ran through each of the tables individually and ran VACUUM ANALYZE one
> by one. That way completed successfully.
>
> I ran my query to see if that had helped performance. Segmentation fault.
> The server restarted itself and came back to life. Reran the query and it
> crashed again. Every time I run the query I get the same segmentation
> fault. Different queries run just fine.
>
> I tried attaching GDB to the PID that the log file says is the main pid
> for startup. That seems to have worked in that it tells me that there were
> various libraries that were loaded. I'm not sure if it is sufficient or
> not. I'm pretty sure it isn't.
>
> I tried getting GDB based on DebuggingProgramCrash - Ubuntu Wiki
> <https://wiki.ubuntu.com/DebuggingProgramCrash> which says "If you are on
> Ubuntu Jammy or later, you do not need to worry about installing debug
> symbol packages anymore." I am running on Jammy so that should be good.
>
> I then tried getting a Valgrind trace from PostgreSQL. I used "systemctl
> stop postgresql" to stop the service from running. I launched Valgrind
> using the command line below my sig. And everything worked just fine except
> the query took 10x the time that it was taking before I started the VACUUM
> ANALYZE.
>
> So I restarted the service, reran my query, and it failed with the same
> log entries showing a Signal 11: Segmentation fault.
>
> Running it inside of Valgrind stopped the error from happening which is
> strange to me.
>
> What can I do to provide enough information to figure out why this is
> crashing?
>
> rjsjr
>
>
>
> Valgrind Command
> valgrind --leak-check=no --gen-suppressions=all \
> --suppressions=src/tools/valgrind.supp --time-stamp=yes \
> --error-markers=VALGRINDERROR-BEGIN,VALGRINDERROR-END \
> --log-file=/datadrive/%p.log --trace-children=yes \
> /usr/lib/postgresql/15/bin/postgres
> --config-file=/etc/postgresql/15/main/postgresql.conf \
> --log_line_prefix="%m %p " \
> --log_statement=all --shared_buffers=64MB 2>&1 | tee
> /datadrive/postmaster.log
>
>
> GDB Backtrace command
> [Thread debugging using libthread_db enabled]
> Using host libthread_db library "/lib/aarch64-linux-gnu/libthread_db.so.1".
> 0x0000ffffb9c6e844 in __GI___select (nfds=9, readfds=0xffffd0849280,
> writefds=0x0, exceptfds=0x0, timeout=0xffffd08491e8) at
> ../sysdeps/unix/sysv/linux/select.c:69
> 69 ../sysdeps/unix/sysv/linux/select.c: No such file or directory.
> (gdb) bt
> #0 0x0000ffffb9c6e844 in __GI___select (nfds=9, readfds=0xffffd0849280,
> writefds=0x0, exceptfds=0x0, timeout=0xffffd08491e8) at
> ../sysdeps/unix/sysv/linux/select.c:69
> #1 0x0000aaaab91b4a1c in ?? ()
> #2 0x0000aaaab91b63cc in PostmasterMain ()
> #3 0x0000aaaab8f053d8 in main ()
>
>
> PostgreSQL Log File
> 2023-09-13 20:20:21.198 UTC [295031] LOG: starting PostgreSQL 15.4
> (Ubuntu 15.4-1.pgdg20.04+1) on aarch64-unknown-linux-gnu, compiled by gcc
> (Ubuntu 9.4.0-1ubuntu1~20.04.1) 9.4.0, 64-bit
> 2023-09-13 20:20:21.198 UTC [295031] LOG: listening on IPv4 address
> "0.0.0.0", port 5432
> 2023-09-13 20:20:21.198 UTC [295031] LOG: listening on IPv6 address "::",
> port 5432
> 2023-09-13 20:20:21.203 UTC [295031] LOG: listening on Unix socket
> "/var/run/postgresql/.s.PGSQL.5432"
> 2023-09-13 20:20:21.218 UTC [295037] LOG: database system was shut down
> at 2023-09-13 20:20:20 UTC
> 2023-09-13 20:20:21.228 UTC [295031] LOG: database system is ready to
> accept connections
> 2023-09-13 20:21:21.604 UTC [295031] LOG: server process (PID 295102) was
> terminated by signal 11: Segmentation fault
> 2023-09-13 20:21:21.604 UTC [295031] DETAIL: Failed process was running:
> --explain analyze
> select
> a.equipmenttype
> , a.technology
> , count(*)
> from
> newfaultentrylineitems a
> where
> a.databaseguid = '9a277d83-30eb-4736-99f2-d028ff84611e'
> and a.assessmentyearmonth >= 202209
> and a.assessmentyearmonth < 202309
> group by a.equipmenttype, a.technology
> order by count(*) desc
> 2023-09-13 20:21:21.604 UTC [295031] LOG: terminating any other active
> server processes
> 2023-09-13 20:21:21.610 UTC [295031] LOG: all server processes
> terminated; reinitializing
> 2023-09-13 20:21:22.074 UTC [295109] LOG: database system was
> interrupted; last known up at 2023-09-13 20:20:21 UTC
> 2023-09-13 20:21:22.137 UTC [295112] FATAL: the database system is in
> recovery mode
> 2023-09-13 20:21:22.187 UTC [295109] LOG: database system was not
> properly shut down; automatic recovery in progress
> 2023-09-13 20:21:22.195 UTC [295109] LOG: redo starts at 17/6AFACF40
> 2023-09-13 20:21:22.195 UTC [295109] LOG: invalid record length at
> 17/6AFACF78: wanted 24, got 0
> 2023-09-13 20:21:22.195 UTC [295109] LOG: redo done at 17/6AFACF40 system
> usage: CPU: user: 0.00 s, system: 0.00 s, elapsed: 0.00 s
> 2023-09-13 20:21:22.208 UTC [295110] LOG: checkpoint starting:
> end-of-recovery immediate wait
> 2023-09-13 20:21:22.383 UTC [295113] FATAL: the database system is not
> yet accepting connections
> 2023-09-13 20:21:22.383 UTC [295113] DETAIL: Consistent recovery state
> has not been yet reached.
> 2023-09-13 20:21:22.428 UTC [295110] LOG: checkpoint complete: wrote 3
> buffers (0.0%); 0 WAL file(s) added, 0 removed, 0 recycled; write=0.199 s,
> sync=0.004 s, total=0.224 s; sync files=2, longest=0.003>
> 2023-09-13 20:21:22.435 UTC [295031] LOG: database system is ready to
> accept connections
> 2023-09-13 20:40:24.416 UTC [295031] LOG: server process (PID 295140) was
> terminated by signal 11: Segmentation fault
> 2023-09-13 20:40:24.416 UTC [295031] DETAIL: Failed process was running:
> --explain analyze
> select
> a.equipmenttype
> , a.technology
> , count(*)
> from
> newfaultentrylineitems a
> where
> a.databaseguid = '9a277d83-30eb-4736-99f2-d028ff84611e'
> and a.assessmentyearmonth >= 202209
> and a.assessmentyearmonth < 202309
> group by a.equipmenttype, a.technology
> order by count(*) desc
> 2023-09-13 20:40:24.416 UTC [295031] LOG: terminating any other active
> server processes
> 2023-09-13 20:40:24.666 UTC [295523] FATAL: the database system is in
> recovery mode
> 2023-09-13 20:40:24.922 UTC [295524] FATAL: the database system is in
> recovery mode
> 2023-09-13 20:40:25.182 UTC [295525] FATAL: the database system is in
> recovery mode
> 2023-09-13 20:40:25.379 UTC [295526] FATAL: the database system is in
> recovery mode
> 2023-09-13 20:40:25.574 UTC [295527] FATAL: the database system is in
> recovery mode
> 2023-09-13 20:40:25.775 UTC [295528] FATAL: the database system is in
> recovery mode
> 2023-09-13 20:40:25.977 UTC [295529] FATAL: the database system is in
> recovery mode
> 2023-09-13 20:40:26.179 UTC [295530] FATAL: the database system is in
> recovery mode
> 2023-09-13 20:40:26.377 UTC [295531] FATAL: the database system is in
> recovery mode
> 2023-09-13 20:40:26.578 UTC [295532] FATAL: the database system is in
> recovery mode
> 2023-09-13 20:40:26.777 UTC [295533] FATAL: the database system is in
> recovery mode
> 2023-09-13 20:40:26.973 UTC [295534] FATAL: the database system is in
> recovery mode
> 2023-09-13 20:40:27.166 UTC [295535] FATAL: the database system is in
> recovery mode
> 2023-09-13 20:40:27.372 UTC [295536] FATAL: the database system is in
> recovery mode
> 2023-09-13 20:40:27.568 UTC [295537] FATAL: the database system is in
> recovery mode
> 2023-09-13 20:40:27.766 UTC [295538] FATAL: the database system is in
> recovery mode
> 2023-09-13 20:40:27.961 UTC [295539] FATAL: the database system is in
> recovery mode
> 2023-09-13 20:40:28.163 UTC [295540] FATAL: the database system is in
> recovery mode
> 2023-09-13 20:40:28.389 UTC [295541] FATAL: the database system is in
> recovery mode
> 2023-09-13 20:40:28.589 UTC [295542] FATAL: the database system is in
> recovery mode
> 2023-09-13 20:40:28.784 UTC [295543] FATAL: the database system is in
> recovery mode
> 2023-09-13 20:40:28.978 UTC [295544] FATAL: the database system is in
> recovery mode
> 2023-09-13 20:40:29.114 UTC [295031] LOG: issuing SIGKILL to recalcitrant
> children
> 2023-09-13 20:40:29.178 UTC [295545] FATAL: the database system is in
> recovery mode
> 2023-09-13 20:40:29.403 UTC [295546] FATAL: the database system is in
> recovery mode
> 2023-09-13 20:40:29.603 UTC [295547] FATAL: the database system is in
> recovery mode
> 2023-09-13 20:40:29.798 UTC [295548] FATAL: the database system is in
> recovery mode
> 2023-09-13 20:40:30.017 UTC [295549] FATAL: the database system is in
> recovery mode
> 2023-09-13 20:40:30.214 UTC [295550] FATAL: the database system is in
> recovery mode
> 2023-09-13 20:40:58.414 UTC [295031] LOG: all server processes
> terminated; reinitializing
> 2023-09-13 20:40:58.876 UTC [295553] LOG: database system was
> interrupted; last known up at 2023-09-13 20:21:22 UTC
> 2023-09-13 20:40:58.986 UTC [295553] LOG: database system was not
> properly shut down; automatic recovery in progress
> 2023-09-13 20:40:58.992 UTC [295553] LOG: redo starts at 17/6AFACFF0
> 2023-09-13 20:40:58.992 UTC [295553] LOG: invalid record length at
> 17/6AFAD028: wanted 24, got 0
> 2023-09-13 20:40:58.992 UTC [295553] LOG: redo done at 17/6AFACFF0 system
> usage: CPU: user: 0.00 s, system: 0.00 s, elapsed: 0.00 s
> 2023-09-13 20:40:59.006 UTC [295554] LOG: checkpoint starting:
> end-of-recovery immediate wait
> 2023-09-13 20:40:59.228 UTC [295554] LOG: checkpoint complete: wrote 3
> buffers (0.0%); 0 WAL file(s) added, 0 removed, 0 recycled; write=0.198 s,
> sync=0.004 s, total=0.228 s; sync files=2, longest=0.003>
> 2023-09-13 20:40:59.237 UTC [295031] LOG: database system is ready to
> accept connections
>
>
>
>
>
>

In response to

Responses

Browse pgsql-bugs by date

  From Date Subject
Next Message Tom Lane 2023-09-13 23:58:57 Re: Segmentation Fault
Previous Message Robert Sanford 2023-09-13 21:53:17 Segmentation Fault