[PATCH v1] parallel pg_restore: avoid disk seeks when jumping short distance forward

From: Dimitrios Apostolou <jimis(at)gmx(dot)net>
To: pgsql-hackers(at)lists(dot)postgresql(dot)org
Cc: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Subject: [PATCH v1] parallel pg_restore: avoid disk seeks when jumping short distance forward
Date: 2025-03-29 00:46:30
Message-ID: 2edb7a57-b225-3b23-a680-62ba90658fec@gmx.net
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Hello list,

I'm submitting a patch for improving an almost 1h long pause at the start
of parallel pg_restore of a big archive. Related discussion has taken
place at pgsql-performance mailing list at:

https://www.postgresql.org/message-id/flat/6bd16bdb-aa5e-0512-739d-b84100596035%40gmx.net

I think I explain it rather well in the commit message, so I paste it
inline:

Improve the performance of parallel pg_restore (-j) from a custom format
pg_dump archive that does not include data offsets - typically happening
when pg_dump has generated it by writing to stdout instead of a file.

In this case pg_restore workers manifest constant looping of reading
small sizes (4KB) and seeking forward small lenths (around 10KB for a
compressed archive):

read(4, "..."..., 4096) = 4096
lseek(4, 55544369152, SEEK_SET) = 55544369152
read(4, "..."..., 4096) = 4096
lseek(4, 55544381440, SEEK_SET) = 55544381440
read(4, "..."..., 4096) = 4096
lseek(4, 55544397824, SEEK_SET) = 55544397824
read(4, "..."..., 4096) = 4096
lseek(4, 55544414208, SEEK_SET) = 55544414208
read(4, "..."..., 4096) = 4096
lseek(4, 55544426496, SEEK_SET) = 55544426496

This happens as each worker scans the whole file until it finds the
entry it wants, skipping forward each block. In combination to the small
block size of the custom format dump, this causes many seeks and low
performance.

Fix by avoiding forward seeks for jumps of less than 1MB forward.
Do instead sequential reads.

Performance gain can be significant, depending on the size of the dump
and the I/O subsystem. On my local NVMe drive, read speeds for that
phase of pg_restore increased from 150MB/s to 3GB/s.

This is my first patch submission, all help is much appreciated.
Regards,
Dimitris

P.S. What is the recommended way to test a change, besides a generic make
check? And how do I run selectively only the pg_dump/restore tests, in
order to speed up my development routine?

Attachment Content-Type Size
v1-0001-parallel-pg_restore-avoid-disk-seeks-when-jumping.patch text/x-patch 2.4 KB

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Corey Huinker 2025-03-29 01:11:05 Re: Statistics Import and Export
Previous Message Masahiko Sawada 2025-03-29 00:09:27 Re: UUID v7