Re: [Proposal] Fully WAL logged CREATE DATABASE - No Checkpoints

From: Robert Haas <robertmhaas(at)gmail(dot)com>
To: Ashutosh Sharma <ashu(dot)coek88(at)gmail(dot)com>
Cc: Dilip Kumar <dilipbalaut(at)gmail(dot)com>, Greg Nancarrow <gregn4422(at)gmail(dot)com>, Andres Freund <andres(at)anarazel(dot)de>, Heikki Linnakangas <hlinnaka(at)iki(dot)fi>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject: Re: [Proposal] Fully WAL logged CREATE DATABASE - No Checkpoints
Date: 2021-12-06 17:45:48
Message-ID: CA+Tgmoa9hzdKYEWci9+45X0Z81+BdL6H4Pkaqzsz=vLxWKHX9w@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Mon, Dec 6, 2021 at 9:23 AM Ashutosh Sharma <ashu(dot)coek88(at)gmail(dot)com> wrote:
> One last point - If we try to clone a huge database, as expected CREATE DATABASE emits a lot of WALs, causing a lot of intermediate checkpoints which seems to be affecting the performance slightly.

Yes, I think this needs to be characterized better. If you have a big
shared buffers setting and a lot of those buffers are dirty and the
template database is small, all of which is fairly normal, then this
new approach should be much quicker. On the other hand, what if the
situation is reversed? Perhaps you have a small shared buffers and not
much of it is dirty and the template database is gigantic. Then maybe
this new approach will be slower. But right now I think we don't know
where the crossover point is, and I think we should try to figure that
out.

So for example, imagine tests with 1GB of shard_buffers, 8GB, and
64GB. And template databases with sizes of whatever the default is,
1GB, 10GB, 100GB. Repeatedly make 75% of the pages dirty and then
create a new database from one of the templates. And then just measure
the performance. Maybe for large databases this approach is just
really the pits -- and if your max_wal_size is too small, it
definitely will be. But, I don't know, maybe with reasonable settings
it's not that bad. Writing everything to disk twice - once to WAL and
once to the target directory - has to be more expensive than doing it
once. But on the other hand, it's all sequential I/O and the data
pages don't need to be fsync'd, so perhaps the overhead is relatively
mild. I don't know.

--
Robert Haas
EDB: http://www.enterprisedb.com

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Robert Haas 2021-12-06 17:55:04 Re: preserving db/ts/relfilenode OIDs across pg_upgrade (was Re: storing an explicit nonce)
Previous Message Mark Dilger 2021-12-06 16:06:02 Re: Optionally automatically disable logical replication subscriptions on error