From: | vignesh C <vignesh21(at)gmail(dot)com> |
---|---|
To: | PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org> |
Subject: | pg_restore crash when there is a failure before all child process is created |
Date: | 2020-01-01 03:50:39 |
Message-ID: | CALDaNm1Luv-E3sarR+-unz-BjchquHHyfP+YC+2FS2pt_J+wxg@mail.gmail.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
Hi,
I found one crash in pg_restore, this occurs when there is a failure before
all the child workers are created. Back trace for the same is given below:
#0 0x00007f9c6d31e337 in raise () from /lib64/libc.so.6
#1 0x00007f9c6d31fa28 in abort () from /lib64/libc.so.6
#2 0x00007f9c6d317156 in __assert_fail_base () from /lib64/libc.so.6
#3 0x00007f9c6d317202 in __assert_fail () from /lib64/libc.so.6
#4 0x0000000000407c9e in WaitForTerminatingWorkers (pstate=0x14af7f0) at
parallel.c:515
#5 0x0000000000407bf9 in ShutdownWorkersHard (pstate=0x14af7f0) at
parallel.c:451
#6 0x0000000000407ae9 in archive_close_connection (code=1, arg=0x6315a0
<shutdown_info>) at parallel.c:368
#7 0x000000000041a7c7 in exit_nicely (code=1) at pg_backup_utils.c:99
#8 0x0000000000408180 in ParallelBackupStart (AH=0x14972e0) at
parallel.c:967
#9 0x000000000040a3dd in RestoreArchive (AHX=0x14972e0) at
pg_backup_archiver.c:661
#10 0x0000000000404125 in main (argc=6, argv=0x7ffd5146f308) at
pg_restore.c:443
The problem is like:
- The variable pstate->numWorkers is being set with the number of
workers initially in ParallelBackupStart.
- Then the workers are created one by one.
- Before creating all the process there is a failure.
- Then the parent terminates the child process and waits for all the
child process to get terminated.
- This function WaitForTerminatingWorkers checks if all process is
terminated by calling HasEveryWorkerTerminated.
- HasEveryWorkerTerminated will always return false because it will
check for the numWorkers rather than the actual forked process count and
hits the next assert "Assert(j < pstate->numWorkers);".
Attached patch has the fix for the same. Fixed it by setting
pstate->numWorkers with the actual worker count when the child process is
being created.
Thoughts?
Regards,
Vignesh
EnterpriseDB: http://www.enterprisedb.com
Attachment | Content-Type | Size |
---|---|---|
0001-pg_restore-crash-when-there-is-a-failure-before-all-worker-creation.patch | application/x-patch | 2.7 KB |
From | Date | Subject | |
---|---|---|---|
Next Message | Kohei KaiGai | 2020-01-01 06:07:57 | Re: TRUNCATE on foreign tables |
Previous Message | Kohei KaiGai | 2020-01-01 02:46:11 | TRUNCATE on foreign tables |