pg_dump (COPY) hanging intermittently

From: Ben Snaidero <bensnaidero(at)geotab(dot)com>
To: pgsql-general <pgsql-general(at)postgresql(dot)org>
Subject: pg_dump (COPY) hanging intermittently
Date: 2019-06-27 17:48:14
Message-ID: CAEPE5bPJHCjGw+9piTw8Mmhm3aDoGgq-fsD2nK1qR1ExtuxWmw@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-general

Hi,

I am running into a strange issue with Postgres 10 when using pg_dump with
the directory format and jobs parameter set it intermittently hangs. Seems
to occur less frequently the lower I set the jobs parameter but does happen
eventually even when set to 2 (could not reproduce when jobs=1). I've
tested with Postgres 11 and no matter how high I set the jobs parameter it
never hangs (tested all the way up to jobs=8). Also with Postgres 10 and
other formats I don't run into the issue. Here are my server specs and
software versions (OS and Postgres)

Windows Server 2016 DataCenter 10.0.14393
2 cores 4 logical processors
26GB memory

Postgres 11.4 (port 5433)
Postgres 10.9 (port 5432)

-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
And here are the psql statements and powershell script to recreate the
issue.

CREATE DATABASE pg_dump_hang;
\c pg_dump_hang
CREATE TABLE series1 AS SELECT i FROM generate_series(1,1000000) i;
CREATE TABLE series2 AS SELECT i FROM generate_series(1,1000000) i;
CREATE TABLE series3 AS SELECT i FROM generate_series(1,1000000) i;
CREATE TABLE series4 AS SELECT i FROM generate_series(1,1000000) i;
CREATE TABLE series5 AS SELECT i FROM generate_series(1,1000000) i;
CREATE TABLE series6 AS SELECT i FROM generate_series(1,1000000) i;
CREATE TABLE series7 AS SELECT i FROM generate_series(1,1000000) i;
CREATE TABLE series8 AS SELECT i FROM generate_series(1,1000000) i;

$env:PGPASSWORD='#######'
$env:PGUSER='postgres'
$env:PGPORT=5433
set-location "C:\program files\postgresql\11\bin"
$a = 1
DO
{
remove-item -Recurse D:\dumphangtest
.\pg_dump.exe --format=directory --jobs=4 --file=D:\dumphangtest
pg_dump_hang
$a++
"Completed loop $a at -" + (date)
} While ($a -le 25)

$env:PGPASSWORD='#######'
$env:PGUSER='postgres'
$env:PGPORT=5432
set-location "c:\program files\postgresql\10\bin"
$a = 1
DO
{
remove-item -Recurse D:\dumphangtest
.\pg_dump.exe --format=directory --jobs=4 --file=D:\dumphangtest
pg_dump_hang
$a++
"Completed loop $a at -" + (date)
} While ($a -le 25)

-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
Here is the output from pg_stat_activity and the filesystem listing when
the hang occurs.

--jobs=4
postgres=# select
pid,state,state_change,query_start,wait_event,wait_event_type,substring(query,1,35)
from pg_stat_activity where datname like 'pg_dump%';
pid | state | state_change |
query_start | wait_event | wait_event_type |
substring
------+---------------------+-------------------------------+-------------------------------+-------------+-----------------+-------------------------------------
5324 | idle in transaction | 2019-06-27 15:29:13.03416+01 | 2019-06-27
15:29:13.03154+01 | ClientRead | Client | SELECT at.attname,
(SELECT pg_catal
4144 | idle in transaction | 2019-06-27 15:29:16.222315+01 | 2019-06-27
15:29:15.070106+01 | ClientRead | Client | COPY public.series6
(i) TO stdout;
5576 | active | 2019-06-27 15:29:13.642069+01 | 2019-06-27
15:29:13.642068+01 | ClientWrite | Client | COPY public.series2
(i) TO stdout;
3860 | idle in transaction | 2019-06-27 15:29:17.109549+01 | 2019-06-27
15:29:16.198842+01 | ClientRead | Client | COPY public.series8
(i) TO stdout;
3120 | idle in transaction | 2019-06-27 15:29:16.350121+01 | 2019-06-27
15:29:15.089817+01 | ClientRead | Client | COPY public.series7
(i) TO stdout;
(5 rows)

D:\>dir dumphangtest
Volume in drive D has no label.
Volume Serial Number is C635-CDEF

Directory of D:\dumphangtest

06/27/2019 10:29 AM <DIR> .
06/27/2019 10:29 AM <DIR> ..
06/27/2019 10:29 AM 2,114,899 2863.dat.gz
06/27/2019 10:29 AM 376,832 2864.dat.gz
06/27/2019 10:29 AM 2,114,899 2865.dat.gz
06/27/2019 10:29 AM 2,114,899 2866.dat.gz
06/27/2019 10:29 AM 2,114,899 2867.dat.gz
06/27/2019 10:29 AM 2,114,899 2868.dat.gz
06/27/2019 10:29 AM 2,114,899 2869.dat.gz
06/27/2019 10:29 AM 2,114,899 2870.dat.gz
06/27/2019 10:29 AM 8,452 toc.dat
9 File(s) 15,189,577 bytes
2 Dir(s) 102,021,439,488 bytes free

--jobs=2
postgres=# select
pid,state,state_change,query_start,wait_event,wait_event_type,substring(query,1,35)
from pg_stat_activity where datname like 'pg_dump%';
pid | state | state_change |
query_start | wait_event | wait_event_type |
substring
------+---------------------+-------------------------------+-------------------------------+-------------+-----------------+-------------------------------------
3420 | idle in transaction | 2019-06-27 14:37:55.067559+01 | 2019-06-27
14:37:54.448198+01 | ClientRead | Client | COPY public.series8
(i) TO stdout;
3180 | idle in transaction | 2019-06-27 14:37:49.537396+01 | 2019-06-27
14:37:49.53608+01 | ClientRead | Client | SELECT at.attname,
(SELECT pg_catal
5364 | active | 2019-06-27 14:37:50.151002+01 | 2019-06-27
14:37:50.151001+01 | ClientWrite | Client | COPY public.series2
(i) TO stdout;
(3 rows)

D:\>dir dumphangtest
Volume in drive D has no label.
Volume Serial Number is C635-CDEF

Directory of D:\dumphangtest

06/27/2019 09:37 AM <DIR> .
06/27/2019 09:37 AM <DIR> ..
06/27/2019 09:37 AM 2,114,899 2863.dat.gz
06/27/2019 09:37 AM 81,920 2864.dat.gz
06/27/2019 09:37 AM 2,114,899 2865.dat.gz
06/27/2019 09:37 AM 2,114,899 2866.dat.gz
06/27/2019 09:37 AM 2,114,899 2867.dat.gz
06/27/2019 09:37 AM 2,114,899 2868.dat.gz
06/27/2019 09:37 AM 2,114,899 2869.dat.gz
06/27/2019 09:37 AM 2,114,899 2870.dat.gz
06/27/2019 09:37 AM 8,452 toc.dat
9 File(s) 14,894,665 bytes
2 Dir(s) 102,056,497,152 bytes free

----------------------------------------------------------------------------------------------------------

And here is the stack trace from the thread of the hung process.

ntoskrnl.exe!KiSwapContext+0x76
ntoskrnl.exe!KiSwapThread+0x17d
ntoskrnl.exe!KiCommitThreadWait+0x14f
ntoskrnl.exe!KeWaitForSingleObject+0x377
ntoskrnl.exe!KiSchedulerApc+0x231
ntoskrnl.exe!KiDeliverApc+0x22a
ntoskrnl.exe!KiSwapThread+0x364
ntoskrnl.exe!KiCommitThreadWait+0x14f
ntoskrnl.exe!KeWaitForMultipleObjects+0x1fe
ntoskrnl.exe!ObWaitForMultipleObjects+0x2c7
ntoskrnl.exe!NtWaitForMultipleObjects+0xf9
ntoskrnl.exe!KiSystemServiceCopyEnd+0x13
ntdll.dll!NtWaitForMultipleObjects+0x14
KERNELBASE.dll!WaitForMultipleObjectsEx+0xef
KERNELBASE.dll!WaitForMultipleObjects+0xe
postgres.exe!WaitLatchOrSocket+0x456
postgres.exe!WaitEventSetWait+0x8e
postgres.exe!secure_write+0xeb
postgres.exe!pq_putbytes+0x4e5
postgres.exe!pq_putbytes+0x41b
postgres.exe!pq_putbytes+0x256
postgres.exe!CreateCopyDestReceiver+0x3784
postgres.exe!CreateCopyDestReceiver+0x17b2
postgres.exe!CreateCopyDestReceiver+0x144c
postgres.exe!CreateCopyDestReceiver+0x1098
postgres.exe!DoCopy+0x50a
postgres.exe!standard_ProcessUtility+0x37b
pg_stat_statements.dll!pg_finfo_pg_stat_statements+0xde1
postgres.exe!ProcessUtility+0x34
postgres.exe!PortalRunFetch+0x891
postgres.exe!PortalRunFetch+0xbae
postgres.exe!PortalRun+0x1a0
postgres.exe!get_stats_option_name+0x1335
postgres.exe!PostgresMain+0x635
postgres.exe!ShmemBackendArrayAllocation+0x2b3a
postgres.exe!SubPostmasterMain+0x275
postgres.exe!main+0x480
postgres.exe!pgwin32_popen+0x144f
KERNEL32.DLL!BaseThreadInitThunk+0x14
ntdll.dll!RtlUserThreadStart+0x21

Any ideas on changes in Postgres 10 that would cause this? We were
previously running 9.6.7 and didn't encounter this issue on that version
either.

Thanks
Ben.

Ben Snaidero
*Geotab*
Senior Database Specialist
Direct +1 (289) 230-7749
Toll-free +1 (877) 436-8221
Visit www.geotab.com
Twitter <https://twitter.com/geotab> | Facebook
<https://www.facebook.com/Geotab> | YouTube
<https://www.youtube.com/user/MyGeotab> | LinkedIn
<https://www.linkedin.com/company/geotab/>

Responses

Browse pgsql-general by date

  From Date Subject
Next Message Tom Lane 2019-06-27 18:34:46 Re: pg_dump (COPY) hanging intermittently
Previous Message Peter Geoghegan 2019-06-27 17:09:53 Re: Postgresql 12 Beta2 Crashes for any Insert/Update