From: | Andres Freund <andres(at)anarazel(dot)de> |
---|---|
To: | Thomas Munro <thomas(dot)munro(at)gmail(dot)com> |
Cc: | pgsql-hackers <pgsql-hackers(at)postgresql(dot)org> |
Subject: | Re: CREATE DATABASE with filesystem cloning |
Date: | 2023-10-09 23:48:27 |
Message-ID: | 20231009234827.k5t2iz4bss7dwanp@awork3.anarazel.de |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
Hi,
On 2023-10-07 18:51:45 +1300, Thomas Munro wrote:
> It should be a lot faster, and use less physical disk, than the two
> existing strategies on recent-ish XFS, BTRFS, very recent OpenZFS,
> APFS (= macOS), and it could in theory be extended to other systems
> that invented different system calls for this with more work (Solaris,
> Windows). Then extra physical disk space will be consumed only as the
> two clones diverge.
> It's just like the old strategy=file_copy, except it asks the OS to do
> its best copying trick. If you try it on a system that doesn't
> support copy-on-write, then copy_file_range() should fall back to
> plain old copy, but it might still be better than we could do, as it
> can push copy commands to network storage or physical storage.
>
> Therefore, the usual caveats from strategy=file_copy also apply here.
> Namely that it has to perform checkpoints which could be very
> expensive, and there are some quirks/brokenness about concurrent
> backups and PITR. Which makes me wonder if it's worth pursuing this
> idea. Thoughts?
I think it'd be interesting to have. For the regression tests we do end up
spending a lot of disk throughput on contents duplicated between
template0/template1/postgres. And I've plenty of time spent time copying huge
template databases, to have a reproducible starting point for some benchmark
that's expensive to initialize.
If we do this, I think we should consider creating template0, template1 with
the new strategy, so that a new initdb cluster ends up with deduplicated data.
FWIW, I experimented with using cp -c on macos for the initdb template, and
that provided some further gain. I suspect that that gain would increase if
template0/template1/postgres were deduplicated.
> diff --git a/src/backend/storage/file/copydir.c b/src/backend/storage/file/copydir.c
> index e04bc3941a..8c963ff548 100644
> --- a/src/backend/storage/file/copydir.c
> +++ b/src/backend/storage/file/copydir.c
> @@ -19,14 +19,21 @@
> #include "postgres.h"
>
> #include <fcntl.h>
> +#include <limits.h>
> #include <unistd.h>
>
> +#ifdef HAVE_COPYFILE_H
> +#include <copyfile.h>
> +#endif
We already have code around this in src/bin/pg_upgrade/file.c, seems we ought
to move it somewhere in src/port?
Greetings,
Andres Freund
From | Date | Subject | |
---|---|---|---|
Next Message | vignesh C | 2023-10-10 00:43:17 | Re: typo in couple of places |
Previous Message | Peter Geoghegan | 2023-10-09 23:46:26 | Re: post-recovery amcheck expectations |