Re: [Proposal] Fully WAL logged CREATE DATABASE - No Checkpoints

From: Dilip Kumar <dilipbalaut(at)gmail(dot)com>
To: Robert Haas <robertmhaas(at)gmail(dot)com>
Cc: Andres Freund <andres(at)anarazel(dot)de>, Heikki Linnakangas <hlinnaka(at)iki(dot)fi>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject: Re: [Proposal] Fully WAL logged CREATE DATABASE - No Checkpoints
Date: 2021-09-03 10:23:17
Message-ID: CAFiTN-uiY_pBV38ciZCmta_qDgV9b5rGOmLR07XQz2-wrg+7Mg@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Thu, Sep 2, 2021 at 8:52 PM Robert Haas <robertmhaas(at)gmail(dot)com> wrote:

> On Thu, Sep 2, 2021 at 2:06 AM Dilip Kumar <dilipbalaut(at)gmail(dot)com> wrote:
> > 0003- The main patch for WAL logging the created database operation.
>
> Andres pointed out that this approach ends up accessing relations
> without taking a lock on them. It doesn't look like you did anything
> about that.
>

I missed that, I have shared my opinion about this in my last email [1]

>
> + /* Built-in oids are mapped directly */
> + if (classForm->oid < FirstGenbkiObjectId)
> + relfilenode = classForm->oid;
> + else if (OidIsValid(classForm->relfilenode))
> + relfilenode = classForm->relfilenode;
> + else
> + continue;
>
> Am I missing something, or is this totally busted?
>

Oops, I think the condition should be like below, but I will think
carefully before posting the next version if there is something else I am
missing.

if (OidIsValid(classForm->relfilenode))
relfilenode = classForm->relfilenode;
else if if (classForm->oid < FirstGenbkiObjectId)
relfilenode = classForm->oid;
else
continue

> /*
> + * Now drop all buffers holding data of the target database; they should
> + * no longer be dirty so DropDatabaseBuffers is safe.
>
> The way things worked before, this was true, but now AFAICS it's
> false. I'm not sure whether that means that DropDatabaseBuffers() here
> is actually unsafe or whether it just means that you haven't updated
> the comment to explain the reason.
>

I think DropDatabaseBuffers(), itself is unsafe, we just copied pages using
bufmgr and dirtied the buffers so dropping buffers is definitely unsafe
here.

> + * Since we copy the file directly without looking at the shared buffers,
> + * we'd better first flush out any pages of the source relation that are
> + * in shared buffers. We assume no new changes will be made while we are
> + * holding exclusive lock on the rel.
>
> Ditto.
>

Yeah this comment no longer makes sense now.

>
> + /* As always, WAL must hit the disk before the data update does. */
>
> Actually, the way it's coded now, part of the on-disk changes are done
> before WAL is issued, and part are done after. I doubt that's the
> right idea.

There's nothing special about writing the actual payload
> bytes vs. the other on-disk changes (creating directories and files).
> In any case the ordering deserves a better-considered comment than
> this one.
>

Agreed to all, but In general, I think WAL hitting the disk before data is
more applicable for the shared buffers, where we want to flush the WAL
first before writing the shared buffer so that in case of torn page we have
an option to recover the page from previous FPI. But in such cases where we
are creating a directory or file there is no such requirement. Anyways, I
agreed with the comments that it should be more uniform and the comment
should be correct.

+ XLogRegisterData((char *) PG_MAJORVERSION, nbytes);
>
> Surely this is utterly pointless.
>

Yeah it is. During the WAL replay also we know the PG_MAJORVERSION :)

> + CopyDatabase(src_dboid, dboid, src_deftablespace, dst_deftablespace);
> PG_END_ENSURE_ERROR_CLEANUP(createdb_failure_callback,
> PointerGetDatum(&fparms));
>
> I'd leave braces around the code for which we're ensuring error
> cleanup even if it's just one line.
>

Okay

> + if (info == XLOG_DBASEDIR_CREATE)
> {
> xl_dbase_create_rec *xlrec = (xl_dbase_create_rec *)
> XLogRecGetData(record);
>
> Seems odd to rename the record but not change the name of the struct.
> I think I would be inclined to keep the existing record name, but if
> we're going to change it we should be more thorough.
>

Right, I think we can leave the record name as it is.

[1]
https://www.postgresql.org/message-id/CAFiTN-sP_6hWv5AxcwnWCgg%3D4hyEeeZcCgFucZsYWpr3XQbP1g%40mail.gmail.com

--
Regards,
Dilip Kumar
EnterpriseDB: http://www.enterprisedb.com

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Amul Sul 2021-09-03 10:26:35 Re: using an end-of-recovery record in all cases
Previous Message Ronan Dunklau 2021-09-03 09:58:27 Re: pg_receivewal starting position