From: | Corey Huinker <corey(dot)huinker(at)gmail(dot)com> |
---|---|
To: | Thomas Munro <thomas(dot)munro(at)gmail(dot)com> |
Cc: | Pavel Stehule <pavel(dot)stehule(at)gmail(dot)com>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org> |
Subject: | Re: Large files for relations |
Date: | 2023-05-09 20:52:49 |
Message-ID: | CADkLM=eLxsMySQwNzYd6G-Xs0C77ExbtpUOXXYGEZVg+FwBeHg@mail.gmail.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
On Wed, May 3, 2023 at 1:37 AM Thomas Munro <thomas(dot)munro(at)gmail(dot)com> wrote:
> On Wed, May 3, 2023 at 5:21 PM Thomas Munro <thomas(dot)munro(at)gmail(dot)com>
> wrote:
> > rsync --link-dest
>
> I wonder if rsync will grow a mode that can use copy_file_range() to
> share blocks with a reference file (= previous backup). Something
> like --copy-range-dest. That'd work for large-file relations
> (assuming a file system that has block sharing, like XFS and ZFS).
> You wouldn't get the "mtime is enough, I don't even need to read the
> bytes" optimisation, which I assume makes all database hackers feel a
> bit queasy anyway, but you'd get the space savings via the usual
> rolling checksum or a cheaper version that only looks for strong
> checksum matches at the same offset, or whatever other tricks rsync
> might have up its sleeve.
>
I understand the need to reduce open file handles, despite the
possibilities enabled by using large numbers of small file sizes.
Snowflake, for instance, sees everything in 1MB chunks, which makes
massively parallel sequential scans (Snowflake's _only_ query plan)
possible, though I don't know if they accomplish that via separate files,
or via segments within a large file.
I am curious whether a move like this to create a generational change in
file file format shouldn't be more ambitious, perhaps altering the block
format to insert a block format version number, whether that be at every
block, or every megabyte, or some other interval, and whether we store it
in-file or in a separate file to accompany the first non-segmented. Having
such versioning information would allow blocks of different formats to
co-exist in the same table, which could be critical to future changes such
as 64 bit XIDs, etc.
From | Date | Subject | |
---|---|---|---|
Next Message | Andres Freund | 2023-05-09 20:59:11 | Re: Feature: Add reloption support for table access method |
Previous Message | Jeff Davis | 2023-05-09 20:38:24 | Re: walsender performance regression due to logical decoding on standby changes |