Re: [Proposal] Fully WAL logged CREATE DATABASE - No Checkpoints

From: Dilip Kumar <dilipbalaut(at)gmail(dot)com>
To: Justin Pryzby <pryzby(at)telsasoft(dot)com>
Cc: Andres Freund <andres(at)anarazel(dot)de>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Robert Haas <robertmhaas(at)gmail(dot)com>, Ashutosh Sharma <ashu(dot)coek88(at)gmail(dot)com>, Maciek Sakrejda <m(dot)sakrejda(at)gmail(dot)com>, Bruce Momjian <bruce(at)momjian(dot)us>, Alvaro Herrera <alvherre(at)alvh(dot)no-ip(dot)org>, Andrew Dunstan <andrew(at)dunslane(dot)net>, Heikki Linnakangas <hlinnaka(at)iki(dot)fi>, pgsql-hackers(at)lists(dot)postgresql(dot)org
Subject: Re: [Proposal] Fully WAL logged CREATE DATABASE - No Checkpoints
Date: 2022-08-03 16:42:05
Message-ID: CAFiTN-vQGKqy5M50HOy4rmmbYNnJAd+6sU=VtFy7=c0tL8QdNA@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Wed, Aug 3, 2022 at 9:32 PM Justin Pryzby <pryzby(at)telsasoft(dot)com> wrote:
>
> On Wed, Aug 03, 2022 at 04:45:23PM +0530, Dilip Kumar wrote:
> > Another version of the patch which closes the smgr at the end using
> > smgrcloserellocator() and I have also added a commit message.
>
> Thanks for providing a patch.
> This seems to fix the second problem with accessing freed memory.

Thanks for the confirmation.

> But I reproduced the first problem with a handful of tries interrupting the
> while loop:
>
> 2022-08-03 10:39:50.129 CDT client backend[5530] [unknown] PANIC: could not open critical system index 2662
>
> In the failure, when trying to connect to the new "a" DB, it does this:
>
> [pid 10700] openat(AT_FDCWD, "base/17003/pg_filenode.map", O_RDONLY) = 11
> [pid 10700] read(11, "\27'Y\0\21\0\0\0\353\4\0\0\353\4\0\0\341\4\0\0\341\4\0\0\347\4\0\0\347\4\0\0\337\4\0\0\337\4\0\0\24\v\0\0\24\v\0\0\25\v\0\0\25\v\0\0K\20\0\0K\20\0\0L\20\0\0L\20\0\0\202\n\0\0\202\n\0\0\203\n\0\0\203\n\0\0\217\n\0\0\217\n\0\0\220\n\0\0\220\n\0\0b\n\0\0b\n\0\0c\n\0\0c\n\0\0f\n\0\0f\n\0\0g\n\0\0g\n\0\0\177\r\0\0\177\r\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\362\366\252\337", 524) = 524
> [pid 10700] close(11) = 0
> [pid 10700] openat(AT_FDCWD, "base/17003/pg_internal.init", O_RDONLY) = -1 ENOENT (No such file or directory)
> [pid 10700] openat(AT_FDCWD, "base/17003/1259", O_RDWR) = 11
> [pid 10700] lseek(11, 0, SEEK_END) = 106496
> [pid 10700] lseek(11, 0, SEEK_END) = 106496
>
> And then reads nothing but zero bytes from FD 11 (rel 1259/pg_class)
>
> So far, I haven't succeeded in eliciting anything useful from valgrind.

I tried multiple times but had no luck with reproducing this issue.
Do you have some logs to know that just before ^C what was the last
query executed and whether it got canceled or executed completely?
Because theoretically, if the create database failed anywhere in
between then it should at least clean the directory of that newly
created database but seems there are some corrupted data in that
directory so seems it is not symptoms of just the create database
failure but some combination of multiple things. I will put more
thought into this tomorrow.

--
Regards,
Dilip Kumar
EnterpriseDB: http://www.enterprisedb.com

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Robert Treat 2022-08-03 16:42:11 Re: [doc] fix a potential grammer mistake
Previous Message Robert Haas 2022-08-03 16:41:33 Re: fix typos