Re: Warn when parallel restoring a custom dump without data offsets

From: David Gilman <davidgilman1(at)gmail(dot)com>
To: Justin Pryzby <pryzby(at)telsasoft(dot)com>
Cc: PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject: Re: Warn when parallel restoring a custom dump without data offsets
Date: 2020-05-20 12:55:23
Message-ID: CALBH9DAmhfBw=VsKCcmig9jP2pBtuyLU5U5Tun_G=3ULs+q-dg@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Your understanding of the issue is mostly correct:

> I think the PG11
> commit you mentioned (548e5097) happens to make some databases fail in
> parallel restore that previously worked (I didn't check).

Correct, if you do the bisect around that yourself you'll see
pg_restore start failing with the expected "possibly due to
out-of-order restore request" on offset-less dumps. It is a known
issue but it's only documented in code comments, not anywhere user
facing, which is sending people to StackOverflow.

> If the input is unseekable, then we can
> never do a parallel restore at all.

I don't know if this is strictly true. Imagine the case of a database
dump of a single large table with a few indexes, so simple enough that
everything in the file is going to be in restore order. It might seem
silly to parallel restore a single table but remember that pg_restore
also creates indexes in parallel and on a typical development
workstation with a few CPU cores and an SSD it'll be a substantial
improvement. There are probably some other corner cases where you can
get lucky with the offset-less dump and it'll work. That's why my gut
instinct was to warn instead of fail.

> If it *is* seekable, could we
> make _PrintTocData rewind if it gets to EOF using ftello(SEEK_SET, 0)
> and re-scan again from the beginning? Would you want to try that ?

I will try this and report back. I will also see if I can get an strace.

--
David Gilman
:DG<

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Atsushi Torikoshi 2020-05-20 12:56:04 Re: Is it useful to record whether plans are generic or custom?
Previous Message David Rowley 2020-05-20 11:47:05 Re: Subplan result caching