Re: parallel pg_restore blocks on heavy random read I/O on all children processes

From: Dimitrios Apostolou <jimis(at)gmx(dot)net>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: pgsql-performance(at)lists(dot)postgresql(dot)org
Subject: Re: parallel pg_restore blocks on heavy random read I/O on all children processes
Date: 2025-03-27 23:45:58
Message-ID: 1cbb9bd6-60cd-92cb-c3c2-4cf4fd8a7b64@gmx.net
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-performance

Hello again,

I traced the seeking-reading behaviour of parallel pg_restore inside
_skipData() when called from _PrintTocData(). Since most of today's I/O
devices (both rotating and solid state) can read 1MB faster sequentially
than it takes to seek and read 4KB, I tried the following change:

diff --git a/src/bin/pg_dump/pg_backup_custom.c
b/src/bin/pg_dump/pg_backup_custom.c
index 55107b20058..262ba509829 100644
--- a/src/bin/pg_dump/pg_backup_custom.c
+++ b/src/bin/pg_dump/pg_backup_custom.c
@@ -618,31 +618,31 @@ _skipLOs(ArchiveHandle *AH)
* Skip data from current file position.
* Data blocks are formatted as an integer length, followed by data.
* A zero length indicates the end of the block.
*/
static void
_skipData(ArchiveHandle *AH)
{
lclContext *ctx = (lclContext *) AH->formatData;
size_t blkLen;
char *buf = NULL;
int buflen = 0;

blkLen = ReadInt(AH);
while (blkLen != 0)
{
- if (ctx->hasSeek)
+ if (ctx->hasSeek && blkLen > 1024 * 1024)
{
if (fseeko(AH->FH, blkLen, SEEK_CUR) != 0)
pg_fatal("error during file seek: %m");
}
else
{
if (blkLen > buflen)
{
free(buf);
buf = (char *) pg_malloc(blkLen);
buflen = blkLen;
}
if (fread(buf, 1, blkLen, AH->FH) != blkLen)
{
if (feof(AH->FH))

This simple change improves immensely (10x maybe, depends on the number of
workers) the offset-table building phase of the parallel backup.

A problem still remaining is that this offset-table building phase is done
on every worker process, which means that all workers scan almost in
parallel the whole archive. A more intrusive improvement would be to move
this phase to the parent process, before spawning the children.

What do you think?

Regards,
Dimitris

P.S. I also have a simple change that changes -j1 switch to mean "parallel
but with one worker process", that I did for debugging purposes. Not sure
if it is of interest here.

In response to

Responses

Browse pgsql-performance by date

  From Date Subject
Next Message Dimitrios Apostolou 2025-03-29 00:48:49 Re: parallel pg_restore blocks on heavy random read I/O on all children processes
Previous Message Dimitrios Apostolou 2025-03-24 16:43:38 Re: parallel pg_restore blocks on heavy random read I/O on all children processes