Re: Large files for relations

From: Stephen Frost <sfrost(at)snowman(dot)net>
To: Dagfinn Ilmari Mannsåker <ilmari(at)ilmari(dot)org>
Cc: Thomas Munro <thomas(dot)munro(at)gmail(dot)com>, Jim Mlodgenski <jimmy76(at)gmail(dot)com>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Large files for relations
Date: 2023-05-12 13:53:32
Message-ID: ZF5E3E79OQHM4jAx@tamriel.snowman.net
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Greetings,

* Dagfinn Ilmari Mannsåker (ilmari(at)ilmari(dot)org) wrote:
> Thomas Munro <thomas(dot)munro(at)gmail(dot)com> writes:
> > On Fri, May 12, 2023 at 8:16 AM Jim Mlodgenski <jimmy76(at)gmail(dot)com> wrote:
> >> On Mon, May 1, 2023 at 9:29 PM Thomas Munro <thomas(dot)munro(at)gmail(dot)com> wrote:
> >>> I am not aware of any modern/non-historic filesystem[2] that can't do
> >>> large files with ease. Anyone know of anything to worry about on that
> >>> front?
> >>
> >> There is some trouble in the ambiguity of what we mean by "modern" and
> >> "large files". There are still a large number of users of ext4 where
> >> the max file size is 16TB. Switching to a single large file per
> >> relation would effectively cut the max table size in half for those
> >> users. How would a user with say a 20TB table running on ext4 be
> >> impacted by this change?
> […]
> > A less aggressive version of the plan would be that we just keep the
> > segment code for the foreseeable future with no planned cut off, and
> > we make all of those "piggy back" transformations that I showed in the
> > patch set optional. For example, I had it so that CLUSTER would
> > quietly convert your relation to large format, if it was still in
> > segmented format (might as well if you're writing all the data out
> > anyway, right?), but perhaps that could depend on a GUC. Likewise for
> > base backup. Etc. Then someone concerned about hitting the 16TB
> > limit on ext4 could opt out. Or something like that. It seems funny
> > though, that's exactly the user who should want this feature (they
> > have 16,000 relation segment files).
>
> If we're going to have to keep the segment code for the foreseeable
> future anyway, could we not get most of the benefit by increasing the
> segment size to something like 1TB? The vast majority of tables would
> fit in one file, and there would be less risk of hitting filesystem
> limits.

While I tend to agree that 1GB is too small, 1TB seems like it's
possibly going to end up on the too big side of things, or at least,
if we aren't getting rid of the segment code then it's possibly throwing
away the benefits we have from the smaller segments without really
giving us all that much. Going from 1G to 10G would reduce the number
of open file descriptors by quite a lot without having much of a net
change on other things. 50G or 100G would reduce the FD handles further
but starts to make us lose out a bit more on some of the nice parts of
having multiple segments.

Just some thoughts.

Thanks,

Stephen

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Nathaniel Sabanski 2023-05-12 14:34:44 Re: Adding SHOW CREATE TABLE
Previous Message Stephen Frost 2023-05-12 13:47:23 Re: Adding SHOW CREATE TABLE