Re: pg_upgrade failing for 200+ million Large Objects

From: Nathan Bossart <nathandbossart(at)gmail(dot)com>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: "Kumar, Sachin" <ssetiya(at)amazon(dot)com>, Jan Wieck <jan(at)wi3ck(dot)info>, Bruce Momjian <bruce(at)momjian(dot)us>, Zhihong Yu <zyu(at)yugabyte(dot)com>, Andrew Dunstan <andrew(at)dunslane(dot)net>, Magnus Hagander <magnus(at)hagander(dot)net>, Robins Tharakan <tharakan(at)gmail(dot)com>, Peter Eisentraut <peter(dot)eisentraut(at)enterprisedb(dot)com>, "pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: pg_upgrade failing for 200+ million Large Objects
Date: 2024-01-12 22:42:27
Message-ID: 20240112224227.GA3857154@nathanxps13
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Wed, Dec 20, 2023 at 06:47:44PM -0500, Tom Lane wrote:
> * I did not invent a switch to control the batching of blobs; it's
> just hard-wired at 1000 blobs per group here. Probably we need some
> user knob for that, but I'm unsure if we want to expose a count or
> just a boolean for one vs more than one blob per batch. The point of
> forcing one blob per batch would be to allow exact control during
> selective restore, and I'm not sure if there's any value in random
> other settings. On the other hand, selective restore of blobs has
> been completely broken for the last dozen years and I can't recall any
> user complaints about that; so maybe nobody cares and we could just
> leave this as an internal choice.

I think the argument for making this configurable is that folks might have
fewer larger blobs, in which case it might make sense to lower the batch
size, or they might have many smaller blobs, in which case they might want
to increase it. But I'm a bit skeptical that will make any sort of
tremendous difference in practice, and I'm not sure how a user would decide
on the right value besides trial-and-error. In any case, at the moment I
think it'd be okay to keep this an internal setting and wait for feedback
from the field.

> * As the patch stands, we still build a separate TOC entry for each
> comment or seclabel or ACL attached to a blob. If you have a lot of
> blobs with non-default properties then the TOC bloat problem comes
> back again. We could do something about that, but it would take a bit
> of tedious refactoring, and the most obvious way to handle it probably
> re-introduces too-many-locks problems. Is this a scenario that's
> worth spending a lot of time on?

I'll ask around about this.

> Subject: [PATCH v9 1/4] Some small preliminaries for pg_dump changes.

This one looked good to me.

> Subject: [PATCH v9 2/4] In dumps, group large objects into matching metadata
> and data entries.

I spent most of my review time reading this one. Overall, it looks pretty
good to me, and I think you've called out most of the interesting design
choices.

> + char *cmdEnd = psprintf(" OWNER TO %s", fmtId(te->owner));
> +
> + IssueCommandPerBlob(AH, te, "ALTER LARGE OBJECT ", cmdEnd);

This is just a nitpick, but is there any reason not to have
IssueCommandPerBlob() accept a format string and the corresponding
arguments?

> + while (n < 1000 && i + n < ntups)

Another nitpick: it might be worth moving these hard-wired constants to
macros. Without a control switch, that'd at least make it easy for anyone
determined to change the value for their installation.

> Subject: [PATCH v9 3/4] Move BLOBS METADATA TOC entries into SECTION_DATA.

This one looks reasonable, too.

> In this patch I have just hard-wired pg_upgrade to use
> --transaction-size 1000. Perhaps there would be value in adding
> another pg_upgrade option to allow user control of that, but I'm
> unsure that it's worth the trouble; I think few users would use it,
> and any who did would see not that much benefit. However, we
> might need to adjust the logic to make the size be 1000 divided
> by the number of parallel restore jobs allowed.

I wonder if it'd be worth making this configurable for pg_upgrade as an
escape hatch in case the default setting is problematic.

--
Nathan Bossart
Amazon Web Services: https://aws.amazon.com

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Nathan Bossart 2024-01-12 22:48:20 Re: pg_upgrade failing for 200+ million Large Objects
Previous Message Cary Huang 2024-01-12 22:14:38 Re: [DOC] Add detail regarding resource consumption wrt max_connections