From: | Dilip Kumar <dilipbalaut(at)gmail(dot)com> |
---|---|
To: | Andres Freund <andres(at)anarazel(dot)de> |
Cc: | PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org> |
Subject: | Re: [Proposal] Fully WAL logged CREATE DATABASE - No Checkpoints |
Date: | 2021-06-17 05:45:13 |
Message-ID: | CAFiTN-uB3a1CTbzN9mQwK+n0DBLH3tWonGR_joBxy87UNJ-Kag@mail.gmail.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
()log_newpage()On Thu, Jun 17, 2021 at 3:28 AM Andres Freund
<andres(at)anarazel(dot)de> wrote:
>
> Hi,
>
> On 2021-06-15 16:50:24 +0530, Dilip Kumar wrote:
> > The patch modifies both CREATE DATABASE and ALTER DATABASE..SET
> > TABLESPACE to be fully WAL-logged.
>
> Generally quite a bit in favor of this - the current approach is very
> heavyweight, slow and I think we have a few open corner bugs related to
> it.
Great!
>
> > Design Idea:
> > -----------------
> > First, create the target database directory along with the version
> > file and WAL-log this operation.
>
> What happens if you crash / promote at this point?
I will check this.
> > Create the "relation map file" in the target database and copy the
> > content from the source database. For this, we can use some modified
> > versions of the write_relmap_file() and WAL-log the relmap create
> > operation along with the file content. Now, read the relmap file to
> > find the relfilenode for pg_class and then we read pg_class block by
> > block and decode the tuples.
>
> This doesn't seem like a great approach - you're not going to be able to
> use much of the normal infrastructure around processing tuples. So it
> seems like it'd end up with quite a bit of special case code that needs
> to maintained in parallel.
Yeah, this needs some special-purpose code but it is not too much
code. I agree that instead of scanning the pg_class we can scan all
the tablespaces and under that identify the source database directory
as we do now. And from there we can copy each relfilenode block by
block with wal log. Honestly, these both seem like a special-purpose
code. Another problem with directly scanning the directory is, how we
are supposed to get the "relpersistence" which is stored in pg_class
tuple right?
>
> > Now read the source relfilenode block by block using
> > ReadBufferWithoutRelCache() and copy the block to the target
> > relfilenode using smgrextend() and WAL-log them using log_newpage().
> > For the source database, we can not directly use the smgrread(),
> > because there could be some dirty buffers so we will have to read them
> > through the buffer manager interface, otherwise, we will have to flush
> > all the dirty buffers.
>
> I think we might need a bit more batching for the WAL logging. There are
> cases of template database considerably bigger than the default and the
> overhead of logging each write separately seems likely to be noticable.
Yeah, we can do that, and instead of using log_newpage() we can use
log_newpages(), to log multiple pages at once.
--
Regards,
Dilip Kumar
EnterpriseDB: http://www.enterprisedb.com
From | Date | Subject | |
---|---|---|---|
Next Message | Yugo NAGATA | 2021-06-17 06:17:40 | Re: pgbench logging broken by time logic changes |
Previous Message | Mark Dilger | 2021-06-17 05:43:19 | Re: Fix for segfault in logical replication on master |