Re: [Proposal] Fully WAL logged CREATE DATABASE - No Checkpoints

From: Robert Haas <robertmhaas(at)gmail(dot)com>
To: Dilip Kumar <dilipbalaut(at)gmail(dot)com>
Cc: Ashutosh Sharma <ashu(dot)coek88(at)gmail(dot)com>, Maciek Sakrejda <m(dot)sakrejda(at)gmail(dot)com>, Bruce Momjian <bruce(at)momjian(dot)us>, Alvaro Herrera <alvherre(at)alvh(dot)no-ip(dot)org>, Andrew Dunstan <andrew(at)dunslane(dot)net>, Heikki Linnakangas <hlinnaka(at)iki(dot)fi>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject: Re: [Proposal] Fully WAL logged CREATE DATABASE - No Checkpoints
Date: 2022-03-08 21:42:08
Message-ID: CA+TgmoZVBcEFZ+8DKOm5qf0X_SrXqKfTL5JK2O2=BA=KFkwdvw@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Thu, Mar 3, 2022 at 11:22 AM Dilip Kumar <dilipbalaut(at)gmail(dot)com> wrote:
> The new version of the patch fixes these 2 comments pointed by Ashutosh and also splits the GetRelListFromPage() function as suggested by Robert and uses the latest snapshot for scanning the pg_class instead of active snapshot as pointed out by Robert. I haven't yet added the test case to create a database using this new strategy option. So if we are okay with these two options FILE_COPY and WAL_LOG then I will add test cases for the same.

Reviewing 0001, the boundaries of the critical section move slightly,
but only over a memcpy, which can't fail, so that seems fine. But this
comment looks ominous:

* Note: we're cheating a little bit here by assuming that mapped files
* are either in pg_global or the database's default tablespace.

It's not clear to me how the code that follows relies on this
assumption, but the overall patch set would make that not true any
more, so there's some kind of an issue to think about there.

It's a little asymmetric that load_relmap_file() gets a subroutine
read_relmap_file() while write_relmap_file() gets a subroutine
write_relmap_file_internal(). Perhaps we could call the functions
{load,write}_named_relmap_file() or something of that sort.

Reviewing 0002, your comment updates in relmap_redo() are not
complete. Note that there's an unmodified comment that says "Write out
the new map and send sinval" just above where you modify the code to
only conditionally send sinval. I'm somewhat uncomfortable with the
shape of this logic, too. It looks weird to be sometimes calling
write_relmap_file and sometimes write_relmap_file_internal. You'd
expect functions with those names to be called at different
abstraction levels, rather than at parallel call sites. The renaming I
proposed would help with this but it's not just a cosmetic issue: the
logic to construct mapfilename is in this function in one case, but in
the called function in the other case. I can't help but think that the
write_relmap_file()/write_relmap_file_internal() split isn't entirely
the right thing.

I think part of the confusion here is that, pre-patch,
write_relmap_file() gets called during both recovery and normal
running, and the updates to shared_map or local_map are actually
nonsense during recovery, because the local map at least is local to
whatever our database is, and we don't have a database connection if
we're the startup process. After your patch, we're still going through
write_relmap_file in recovery in some cases, but really those map
updates don't seem like things that should be happening at all. And on
the other hand it's not clear to me why the CRC stuff isn't needed in
all cases, but that's only going to happen when we go through the
non-internal version of the function. You've probably spent more time
looking at this code than I have, but I'm wondering if the division
should be like this: we have one function that does the actual update,
and another function that does the update plus sets global variables.
Recovery always uses the first one, and outside of recovery we use the
first one for the create-database case and the second one otherwise.
Thoughts?

Regarding 0003, my initial thought was to like the fact that you'd
avoided duplicating code by using a function parameter, but as I look
at it a bit more, it's not clear to me that it's enough code that we
really care about not duplicating it. I would not expect to find a
function called RelationCopyAllFork() in tablecmds.c. I'd expect to
find it in storage.c, I think. And I think I'd be surprised to find
out that it doesn't actually know anything about copying; it's
basically just a loop over the forks to which you can supply your own
copy-function. And the fact that it's got an argument of type
copy_relation_storage and the argument name is copy_storage and the
value is sometimes RelationCopyStorageis a terminological muddle, too.
So I feel like perhaps this needs more thought.

--
Robert Haas
EDB: http://www.enterprisedb.com

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Nathan Bossart 2022-03-08 22:12:53 Re: Postgres restart in the middle of exclusive backup and the presence of backup_label file
Previous Message Andres Freund 2022-03-08 21:17:16 Re: WIP: WAL prefetch (another approach)