Re: pg_dump crashes

From: Nico De Ranter <nico(dot)deranter(at)esaturnus(dot)com>
To: Adrian Klaver <adrian(dot)klaver(at)aklaver(dot)com>
Cc: pgsql-general(at)postgresql(dot)org
Subject: Re: pg_dump crashes
Date: 2020-05-22 14:55:20
Message-ID: CALVv0fbuUdYxdeFb2MH1JQEbRDQER=xjcw3WptBpAi2nyy1FUQ@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-general

Correct.

If I run 'pg_dumpall --cluster 11/main --file=dump.sql' the end of the
file looks like:

###### cut here
4557430888798830399 1061109567 1061109567 1061109567 1061109567 16191 \N \N
??????????????????????????????
4557430888798830399 1061109567 1061109567 1061109567 1061109567 16191 \N \N
??????????????????????????????
\.

###### cut here

If I run 'pg_dump --table=public.file --cluster 11/main --file=dump-2.sql
bacula' those lines are actually followed by about 850 or so lines that
look ok. I'm assuming the difference is due to buffering.
However the fact that I do see a number of regular lines following this may
suggest it's just garbage in the table but not really causing the issue
afterall.

Nico

On Fri, May 22, 2020 at 4:47 PM Adrian Klaver <adrian(dot)klaver(at)aklaver(dot)com>
wrote:

> On 5/22/20 6:40 AM, Nico De Ranter wrote:
> > I was just trying that. It's always the same (huge) table that crashes
> > the pg_dump. Running a dump excluding that one table goes fine,
> > running a dump of only that one table crashes.
> > In the system logs I always see a segfault
> >
> > May 22 15:22:14 core4 kernel: [337837.874618] postgres[1311]: segfault
> > at 7f778008ed0d ip 000055f197ccc008 sp 00007ffdd1fc15a8 error 4 in
> > postgres[55f1977c0000+727000]
> >
> > It doesn't seem to be an Out-of-memory thing (at least not on the OS
> level).
> > The database is currently installed on a dedicated server with 32GB
> > RAM. I tried tweaking some of the memory parameters for postgres, but
> > the crash always happens at the exact same spot (if I run pg_dump for
> > that one table with and without memory tweaks the resulting files are
> > identical).
> >
> > One thing I just noticed looking at the dump file: at around the end of
> > the file I see this:
>
> So the below is the output from?:
>
> pg_dumpall --cluster 11/main --file=dump.sql
>
> >
> > 2087983804 516130 37989 2218636 3079067 0 0 P4B BcISC IGk L BOT BOP A jC
> > BAA I BeMj/b BceUl6 BehUAn 0Ms A C I4p9CBfUiSeAPU4eDuipKQ
> > *4557430888798830399 1061109567 1061109567 1061109567 1061109567 16191
> > \N \N ??????????????????????????????
> > 4557430888798830399 1061109567 1061109567 1061109567 1061109567 16191 \N
> > \N ??????????????????????????????
> > 4557430888798830399 1061109567 1061109567 1061109567 1061109567 16191 \N
> > \N ??????????????????????????????
> > 4557430888798830399 1061109567 1061109567 1145127487 1413694803 21071 \N
> > \N ??????????????????????????????
> > 4557430888798830399 1061109567 1061109567 1061109567 1061109567 16191 \N
> > \N ??????????????????????????????
> > 4557430888798830399 1061109567 1061109567 1061109567 1061109567 16191 \N
> > \N ??????????????????????????????
> > 4557430888798830399 1061109567 1061109567 1061109567 1061109567 16191 \N
> > \N ??????????????????????????????
> > 4557430888798830399 1061109567 1061109567 1061109567 1061109567 16191 \N
> > \N ??????????????????????????????
> > 4557430888798830399 1061109567 1061109567 1061109567 1061109567 16191 \N
> > \N ??????????????????????????????
> > 4557430888798830399 1061109567 1061109567 1061109567 1061109567 16191 \N
> > \N ??????????????????????????????
> > 4557430888798830399 1061109567 1061109567 1145127487 1413694803 21071 \N
> > \N ??????????????????????????????
> > 4557430888798830399 1061109567 1061109567 1061109567 1061109567 16191 \N
> > \N ??????????????????????????????
> > 4557430888798830399 1061109567 1061109567 1061109567 1061109567 16191 \N
> > \N ??????????????????????????????
> > 4557430888798830399 1061109567 1061109567 1061109567 1061109567 16191 \N
> > \N ??????????????????????????????
> > 4557430888798830399 1061109567 1061109567 1061109567 1061109567 16191 \N
> > \N ??????????????????????????????
> > 4557430888798830399 1061109567 1061109567 1061109567 1061109567 16191 \N
> > \N ??????????????????????????????
> > 4557430888798830399 1061109567 1061109567 1061109567 1061109567 16191 \N
> > \N ??????????????????????????????
> > 6071772946555290175 1056985679 1061109567 1061109567 1061109567 16191 \N
> > \N ??????????????????????????????
> > 4557430888798830399 1061109567 1061109567 1061109567 1061109567 16191 \N
> > \N ??????????????????????????????
> > 4557430888798830399 1061109567 1061109567 1061109567 1061109567 16191 \N
> > \N ??????????????????????????????
> > 4557430888798830399 1061109567 1061109567 1061109567 1061109567 16191 \N
> > \N ??????????????????????????????
> > 4557430888798830399 1061109567 1061109567 1061109567 1061109567 16191 \N
> > \N ??????????????????????????????
> > 4557430888798830399 1061109567 1061109567 1061109567 1061109567 16191 \N
> > \N ??????????????????????????????
> > 4557430888798830399 1061109567 1061109567 1061109567 1061109567 16191 \N
> > \N ??????????????????????????????
> > 4557430888798830399 1061109567 1061109567 1061109567 1061109567 16191 \N
> > \N ??????????????????????????????
> > 4557430888798830399 1061109567 1061109567 1061109567 1061109567 16191 \N
> > \N ??????????????????????????????
> > 4557430888798830399 1061109567 1061109567 1061109567 1061109567 16191 \N
> > \N ??????????????????????????????
> > 4557430888798830399 1061109567 1061109567 1061109567 1061109567 16191 \N
> > \N ??????????????????????????????*
> > 2087983833 554418 37989 5405605 14507502 0 0 P4B Bb8c/ IGk L BOS BOP A
> > Lfh BAA Bg BeMj+2 Bd1LVN BehUAl rlx ABA TOR
> >
> > It looks suspicious however there are about 837 more lines before the
> > output stops.
> >
> > Nico
> >
> > On Fri, May 22, 2020 at 3:27 PM Adrian Klaver <adrian(dot)klaver(at)aklaver(dot)com
> > <mailto:adrian(dot)klaver(at)aklaver(dot)com>> wrote:
> >
> > On 5/22/20 5:37 AM, Nico De Ranter wrote:
> > > Hi all,
> > >
> > > Postgres version: 9.5
> > > OS: Ubuntu 18.04.4
> > >
> > > I have a 144GB Bacula database that crashes the postgres daemon
> > when I
> > > try to do a pg_dump.
> > > At some point the server ran out of diskspace for the database
> > storage.
> > > I expanded the lvm and rebooted the server. It seemed to work
> fine,
> > > however when I try to dump the bacula database the postgres
> > daemon dies
> > > after about 37GB.
> > >
> > > I tried copying the database to another machine and upgrading
> > postgres
> > > to 11 using pg_upgrade. The upgrade seems to work but I still get
> > > exactly the same problem when trying to dump the database.
> > >
> > > postgres(at)core4:~$ pg_dumpall --cluster 11/main --file=dump.sql
> > > pg_dump: Dumping the contents of table "file" failed:
> > PQgetCopyData()
> > > failed.
> > > pg_dump: Error message from server: server closed the connection
> > > unexpectedly
> > > This probably means the server terminated abnormally
> > > before or while processing the request.
> > > pg_dump: The command was: COPY public.file (fileid, fileindex,
> > jobid,
> > > pathid, filenameid, deltaseq, markid, lstat, md5) TO stdout;
> > > pg_dumpall: pg_dump failed on database "bacula", exiting
> >
> > What happens if you try to dump just this table?
> >
> > Something along lines of:
> >
> > pg_dump -t file -d some_db -U some_user
> >
> > Have you looked at the system logs to see if it is the OS killing the
> > process?
> >
> >
> > >
> > > In the logs I see:
> > >
> > > 2020-05-22 14:23:30.649 CEST [12768] LOG: server process (PID
> > 534) was
> > > terminated by signal 11: Segmentation fault
> > > 2020-05-22 14:23:30.649 CEST [12768] DETAIL: Failed process was
> > > running: COPY public.file (fileid, fileindex, jobid, pathid,
> > filenameid,
> > > deltaseq, markid, lstat, md5) TO stdout;
> > > 2020-05-22 14:23:30.651 CEST [12768] LOG: terminating any other
> > active
> > > server processes
> > > 2020-05-22 14:23:30.651 CEST [482] WARNING: terminating
> connection
> > > because of crash of another server process
> > > 2020-05-22 14:23:30.651 CEST [482] DETAIL: The postmaster has
> > commanded
> > > this server process to roll back the current transaction and exit,
> > > because another server process exited abnormally and possibly
> > corrupted
> > > shared memory.
> > > 2020-05-22 14:23:30.651 CEST [482] HINT: In a moment you should
> > be able
> > > to reconnect to the database and repeat your command.
> > > 2020-05-22 14:23:30.652 CEST [12768] LOG: all server processes
> > > terminated; reinitializing
> > > 2020-05-22 14:23:30.671 CEST [578] LOG: database system was
> > > interrupted; last known up at 2020-05-22 14:15:19 CEST
> > > 2020-05-22 14:23:30.809 CEST [578] LOG: database system was not
> > > properly shut down; automatic recovery in progress
> > > 2020-05-22 14:23:30.819 CEST [578] LOG: redo starts at
> 197/D605EA18
> > > 2020-05-22 14:23:30.819 CEST [578] LOG: invalid record length at
> > > 197/D605EA50: wanted 24, got 0
> > > 2020-05-22 14:23:30.819 CEST [578] LOG: redo done at 197/D605EA18
> > > 2020-05-22 14:23:30.876 CEST [12768] LOG: database system is
> > ready to
> > > accept connections
> > > 2020-05-22 14:29:07.511 CEST [12768] LOG: received fast shutdown
> > request
> > >
> > >
> > > Any ideas how to fix or debug this?
> > >
> > > Nico
> > >
> > > --
> > >
> > > Nico De Ranter
> > >
> > > Operations Engineer
> > >
> > > T. +32 16 38 72 10
> > >
> > >
> > > <http://www.esaturnus.com>
> > >
> > > <http://www.esaturnus.com>
> > >
> > >
> > > eSATURNUS
> > > Philipssite 5, D, box 28
> > > 3001 Leuven – Belgium
> > >
> > >
> > >
> > > T. +32 16 40 12 82
> > > F. +32 16 40 84 77
> > > www.esaturnus.com <http://www.esaturnus.com>
> > <http://www.esaturnus.com>
> > >
> > > ** <http://www.esaturnus.com/>
> > >
> > > *For Service & Support :*
> > >
> > > Support Line Belgium: +32 2 2009897
> > >
> > > Support Line International: +44 12 56 68 38 78
> > >
> > > Or via email : medical(dot)services(dot)eu(at)sony(dot)com
> > <mailto:medical(dot)services(dot)eu(at)sony(dot)com>
> > > <mailto:medical(dot)services(dot)eu(at)sony(dot)com
> > <mailto:medical(dot)services(dot)eu(at)sony(dot)com>>
> > >
> > >
> >
> >
> > --
> > Adrian Klaver
> > adrian(dot)klaver(at)aklaver(dot)com <mailto:adrian(dot)klaver(at)aklaver(dot)com>
> >
> >
> >
> > --
> >
> > Nico De Ranter
> >
> > Operations Engineer
> >
> > T. +32 16 38 72 10
> >
> >
> > <http://www.esaturnus.com>
> >
> > <http://www.esaturnus.com>
> >
> >
> > eSATURNUS
> > Philipssite 5, D, box 28
> > 3001 Leuven – Belgium
> >
> >
> >
> > T. +32 16 40 12 82
> > F. +32 16 40 84 77
> > www.esaturnus.com <http://www.esaturnus.com>
> >
> > ** <http://www.esaturnus.com/>
> >
> > *For Service & Support :*
> >
> > Support Line Belgium: +32 2 2009897
> >
> > Support Line International: +44 12 56 68 38 78
> >
> > Or via email : medical(dot)services(dot)eu(at)sony(dot)com
> > <mailto:medical(dot)services(dot)eu(at)sony(dot)com>
> >
> >
>
>
> --
> Adrian Klaver
> adrian(dot)klaver(at)aklaver(dot)com
>

--

Nico De Ranter

Operations Engineer

T. +32 16 38 72 10

<http://www.esaturnus.com>

<http://www.esaturnus.com>

eSATURNUS
Philipssite 5, D, box 28
3001 Leuven – Belgium

T. +32 16 40 12 82
F. +32 16 40 84 77
www.esaturnus.com

<http://www.esaturnus.com/>

*For Service & Support :*

Support Line Belgium: +32 2 2009897

Support Line International: +44 12 56 68 38 78

Or via email : medical(dot)services(dot)eu(at)sony(dot)com

In response to

Responses

Browse pgsql-general by date

  From Date Subject
Next Message Adrian Klaver 2020-05-22 15:01:56 Re: pg_dump crashes
Previous Message Nico De Ranter 2020-05-22 14:48:53 Re: pg_dump crashes