[PATCH] parallel pg_restore: move offset-building phase to before forking

From: Dimitrios Apostolou <jimis(at)gmx(dot)net>
To: pgsql-hackers(at)lists(dot)postgresql(dot)org
Subject: [PATCH] parallel pg_restore: move offset-building phase to before forking
Date: 2025-04-04 17:11:31
Message-ID: b51f7c7a-f31b-f0e1-fc17-5bb4c3057ef5@gmx.net
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Hello list,

based on the delays I experienced in pg_restore, as described at:

https://www.postgresql.org/message-id/flat/6bd16bdb-aa5e-0512-739d-b84100596035(at)gmx(dot)net

I noticed that the seeking-reading behaviour was manifested by every one
of the pg_restore worker processes, in parallel, making the situation even
worse. With this patch I moved this phase to the parent process before
fork(), so that the children have the necessary information from birth.

Copying the commit message:

A pg_dump custom format archive without offsets in the table of
contents, is usually generated when pg_dump writes to stdout instead of
a file. When doing parallel pg_restore (-j) from such a file, every
worker process was scanning the full archive sequentially, in order to
build the offset table and find the parts assigned to restore. This led
to the worker processes competing for I/O.

This patch moves this offset-table building phase to the parent process,
before forking the worker processes.

The upside is that we now have only one extra scan of the file.
And this scan happens without other competing I/O, so it completes
faster.

The downside is that there is a delay before spawning the children and
starting assigning jobs to them.

What do you think?

Thanks,
Dimitris

Attachment Content-Type Size
v1-0001-parallel-pg_restore-move-offset-building-phase-to.patch text/x-patch 5.8 KB

Browse pgsql-hackers by date

  From Date Subject
Next Message Marcos Pegoraro 2025-04-04 17:25:57 Re: Exponential notation bug
Previous Message Jakub Wartak 2025-04-04 17:07:12 Re: Draft for basic NUMA observability