From: | Hannu Krosing <hannuk(at)google(dot)com> |
---|---|
To: | Dimitrios Apostolou <jimis(at)gmx(dot)net>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> |
Cc: | pgsql-performance(at)lists(dot)postgresql(dot)org |
Subject: | Re: parallel pg_restore blocks on heavy random read I/O on all children processes |
Date: | 2025-04-10 06:50:33 |
Message-ID: | CAMT0RQTz7Zi99C66U2160Mmcxj+fBJ0OpD6Eq=aLZsdYaFZwBg@mail.gmail.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-performance |
You may be interested in a patch "Adding pg_dump flag for parallel
export to pipes"[1] which allows using pipes in directory former
parallel dump and restore.
There the offsets are implicitly taken care of by the file system.
On Sun, Mar 23, 2025 at 4:46 PM Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
>
> Dimitrios Apostolou <jimis(at)gmx(dot)net> writes:
> > On Thu, 20 Mar 2025, Tom Lane wrote:
> >> I am betting that the problem is that the dump's TOC (table of
> >> contents) lacks offsets to the actual data of the database objects,
> >> and thus the readers have to reconstruct that information by scanning
> >> the dump file. Normally, pg_dump will back-fill offset data in the
> >> TOC at completion of the dump, but if it's told to write to an
> >> un-seekable output file then it cannot do that.
>
> > Further questions:
>
> > * Does the same happen in an uncompressed dump? Or maybe the offsets are
> > pre-filled because they are predictable without compression?
>
> Yes; no. We don't know the size of a table's data as-dumped until
> we've dumped it.
>
> > * Should pg_dump print some warning for generating a lower quality format?
>
> I don't think so. In many use-cases this is irrelevant and the
> warning would just be an annoyance.
>
> > * The seeking pattern in pg_restore seems non-sensical to me: reading 4K,
> > jumping 8-12K, repeat for the whole file? Consuming 15K IOPS for an
> > hour. /Maybe/ something to improve there... Where can I read more about
> > the format?
>
> It's reading data blocks (or at least the headers thereof), which have
> a limited size. I don't think that size has changed since circa 1999,
> so maybe we could consider increasing it; but I doubt we could move
> the needle very far that way.
>
> > * Why doesn't it happen in single-process pg_restore?
>
> A single-process restore is going to restore all the data in the order
> it appears in the archive file, so no seeking is required. Of course,
> as soon as you ask for parallelism, that doesn't work too well.
>
> Hypothetically, maybe the algorithm for handing out tables-to-restore
> to parallel workers could pay attention to the distance to the data
> ... except that in the problematic case we don't have that
> information. I don't recall for sure, but I think that the order of
> the TOC entries is not necessarily a usable proxy for the order of the
> data entries. It's unclear to me that overriding the existing
> heuristic (biggest tables first, I think) would be a win anyway.
>
> regards, tom lane
>
>
From | Date | Subject | |
---|---|---|---|
Next Message | James Pang | 2025-04-11 14:36:57 | many sessions wait on LWlock WALWrite suddenly |
Previous Message | Vitale, Anthony, Sony Music | 2025-04-09 15:47:32 | RE: Question on what Duration in the log |