| From: | vignesh C <vignesh21(at)gmail(dot)com> | 
|---|---|
| To: | PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org> | 
| Subject: | pg_restore crash when there is a failure before all child process is created | 
| Date: | 2020-01-01 03:50:39 | 
| Message-ID: | CALDaNm1Luv-E3sarR+-unz-BjchquHHyfP+YC+2FS2pt_J+wxg@mail.gmail.com | 
| Views: | Whole Thread | Raw Message | Download mbox | Resend email | 
| Thread: | |
| Lists: | pgsql-hackers | 
Hi,
I found one crash in pg_restore, this occurs when there is a failure before
all the child workers are created. Back trace for the same is given below:
#0  0x00007f9c6d31e337 in raise () from /lib64/libc.so.6
#1  0x00007f9c6d31fa28 in abort () from /lib64/libc.so.6
#2  0x00007f9c6d317156 in __assert_fail_base () from /lib64/libc.so.6
#3  0x00007f9c6d317202 in __assert_fail () from /lib64/libc.so.6
#4  0x0000000000407c9e in WaitForTerminatingWorkers (pstate=0x14af7f0) at
parallel.c:515
#5  0x0000000000407bf9 in ShutdownWorkersHard (pstate=0x14af7f0) at
parallel.c:451
#6  0x0000000000407ae9 in archive_close_connection (code=1, arg=0x6315a0
<shutdown_info>) at parallel.c:368
#7  0x000000000041a7c7 in exit_nicely (code=1) at pg_backup_utils.c:99
#8  0x0000000000408180 in ParallelBackupStart (AH=0x14972e0) at
parallel.c:967
#9  0x000000000040a3dd in RestoreArchive (AHX=0x14972e0) at
pg_backup_archiver.c:661
#10 0x0000000000404125 in main (argc=6, argv=0x7ffd5146f308) at
pg_restore.c:443
The problem is like:
   - The variable pstate->numWorkers is being set with the number of
   workers initially in ParallelBackupStart.
   - Then the workers are created one by one.
   - Before creating all the process there is a failure.
   - Then the parent terminates the child process and waits for all the
   child process to get terminated.
   - This function WaitForTerminatingWorkers checks if all process is
   terminated by calling HasEveryWorkerTerminated.
   - HasEveryWorkerTerminated will always return false because it will
   check for the numWorkers rather than the actual forked process count and
   hits the next assert "Assert(j < pstate->numWorkers);".
Attached patch has the fix for the same. Fixed it by setting
pstate->numWorkers with the actual worker count when the child process is
being created.
Thoughts?
Regards,
Vignesh
EnterpriseDB: http://www.enterprisedb.com
| Attachment | Content-Type | Size | 
|---|---|---|
| 0001-pg_restore-crash-when-there-is-a-failure-before-all-worker-creation.patch | application/x-patch | 2.7 KB | 
| From | Date | Subject | |
|---|---|---|---|
| Next Message | Kohei KaiGai | 2020-01-01 06:07:57 | Re: TRUNCATE on foreign tables | 
| Previous Message | Kohei KaiGai | 2020-01-01 02:46:11 | TRUNCATE on foreign tables |