Re: [Proposal] Fully WAL logged CREATE DATABASE - No Checkpoints

From: Dilip Kumar <dilipbalaut(at)gmail(dot)com>
To: Andres Freund <andres(at)anarazel(dot)de>
Cc: Justin Pryzby <pryzby(at)telsasoft(dot)com>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Robert Haas <robertmhaas(at)gmail(dot)com>, Ashutosh Sharma <ashu(dot)coek88(at)gmail(dot)com>, Maciek Sakrejda <m(dot)sakrejda(at)gmail(dot)com>, Bruce Momjian <bruce(at)momjian(dot)us>, Alvaro Herrera <alvherre(at)alvh(dot)no-ip(dot)org>, Andrew Dunstan <andrew(at)dunslane(dot)net>, Heikki Linnakangas <hlinnaka(at)iki(dot)fi>, pgsql-hackers(at)lists(dot)postgresql(dot)org
Subject: Re: [Proposal] Fully WAL logged CREATE DATABASE - No Checkpoints
Date: 2022-08-03 05:58:03
Message-ID: CAFiTN-sG9-ocJFN+KBAQ2mTuF-9Ja+i1Qo81yqr0WDOsqqYpLg@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Wed, Aug 3, 2022 at 3:53 AM Andres Freund <andres(at)anarazel(dot)de> wrote:
>
> On 2022-08-02 17:04:16 -0500, Justin Pryzby wrote:
> > I got this interesting looking thing.
> >
> > ==11628== Invalid write of size 8
> > ==11628== at 0x1D12B3A: smgrsetowner (smgr.c:213)
> > ==11628== by 0x1C7C224: RelationGetSmgr (rel.h:572)
> > ==11628== by 0x1C7C224: RelationCopyStorageUsingBuffer (bufmgr.c:3725)
> > ==11628== by 0x1C7C7A6: CreateAndCopyRelationData (bufmgr.c:3817)
> > ==11628== by 0x14A4518: CreateDatabaseUsingWalLog (dbcommands.c:221)
> > ==11628== by 0x14AB009: createdb (dbcommands.c:1393)
> > ==11628== by 0x1D2B9AF: standard_ProcessUtility (utility.c:776)
> > ==11628== by 0x1D2C46A: ProcessUtility (utility.c:530)
> > ==11628== by 0x1D265F5: PortalRunUtility (pquery.c:1158)
> > ==11628== by 0x1D27089: PortalRunMulti (pquery.c:1315)
> > ==11628== by 0x1D27A7C: PortalRun (pquery.c:791)
> > ==11628== by 0x1D1E33D: exec_simple_query (postgres.c:1243)
> > ==11628== by 0x1D218BC: PostgresMain (postgres.c:4505)
> > ==11628== Address 0x1025bc18 is 2,712 bytes inside a block of size 8,192 free'd
> > ==11628== at 0x4033A3F: free (in /usr/lib/x86_64-linux-gnu/valgrind/vgpreload_memcheck-amd64-linux.so)
> > ==11628== by 0x217D7C2: AllocSetReset (aset.c:608)
> > ==11628== by 0x219B57A: MemoryContextResetOnly (mcxt.c:181)
> > ==11628== by 0x217DBD5: AllocSetDelete (aset.c:654)
> > ==11628== by 0x219C1EC: MemoryContextDelete (mcxt.c:252)
> > ==11628== by 0x21A109F: PortalDrop (portalmem.c:596)
> > ==11628== by 0x21A269C: AtCleanup_Portals (portalmem.c:907)
> > ==11628== by 0x11FEAB1: CleanupTransaction (xact.c:2890)
> > ==11628== by 0x120A74C: AbortCurrentTransaction (xact.c:3328)
> > ==11628== by 0x1D2158C: PostgresMain (postgres.c:4232)
> > ==11628== by 0x1B15DB5: BackendRun (postmaster.c:4490)
> > ==11628== by 0x1B1D799: BackendStartup (postmaster.c:4218)
> > ==11628== Block was alloc'd at
> > ==11628== at 0x40327F3: malloc (in /usr/lib/x86_64-linux-gnu/valgrind/vgpreload_memcheck-amd64-linux.so)
> > ==11628== by 0x217F0DC: AllocSetAlloc (aset.c:920)
> > ==11628== by 0x219E4D2: palloc (mcxt.c:1082)
> > ==11628== by 0x14A14BE: ScanSourceDatabasePgClassTuple (dbcommands.c:444)
> > ==11628== by 0x14A1CD8: ScanSourceDatabasePgClassPage (dbcommands.c:384)
> > ==11628== by 0x14A20BF: ScanSourceDatabasePgClass (dbcommands.c:322)
> > ==11628== by 0x14A4348: CreateDatabaseUsingWalLog (dbcommands.c:177)
> > ==11628== by 0x14AB009: createdb (dbcommands.c:1393)
> > ==11628== by 0x1D2B9AF: standard_ProcessUtility (utility.c:776)
> > ==11628== by 0x1D2C46A: ProcessUtility (utility.c:530)
> > ==11628== by 0x1D265F5: PortalRunUtility (pquery.c:1158)
> > ==11628== by 0x1D27089: PortalRunMulti (pquery.c:1315)
>
> Ick. That looks like somehow we end up with smgr entries still pointing to
> fake relcache entries, created in a prior attempt at create database.

The surprising thing is how the smgr entry survived the transaction
abort, I mean AtEOXact_SMgr should have closed the smgr and should
have removed from the
smgr cache.

> Looks like you'd need error trapping to call FreeFakeRelcacheEntry() (or just
> smgrclearowner()) in case of error.
>
> Or perhaps we can instead prevent the fake relcache entry being set as the
> owner in the first place?
>
> Why do we even need fake relcache entries here? Looks like all that they're
> used for is a bunch of RelationGetSmgr() calls? Can't we instead just pass the
> rnode to smgropen()? Given that we're doing that once for every buffer in the
> body of RelationCopyStorageUsingBuffer(), doing it in a bunch of other
> less-frequent places can't be a problem.
> can't

I think in some of the previous versions of the patch we were using
smgropen() but changed it so that we do not reuse the smgr after it
gets removed during interrupt processing, see discussion here[1]

[1]
https://www.postgresql.org/message-id/CA%2BTgmoYKovODW2Y7rQmmRFaKu445p9uAahjpgfbY8eyeL07BXA%40mail.gmail.com

From the Valgrind report, it is clear that we are getting the smgr
entry whose smgr->smgr_owner is pointing into the fake relcache entry.
So I am investigating further how it is possible for the smgr created
during a previous create database attempt to survive beyond abort
transaction.

--
Regards,
Dilip Kumar
EnterpriseDB: http://www.enterprisedb.com

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Amit Kapila 2022-08-03 06:21:37 Re: automatically generating node support functions
Previous Message Andrey Borodin 2022-08-03 05:53:15 Re: Slow standby snapshot