From: | Noah Misch <noah(at)leadboat(dot)com> |
---|---|
To: | Heikki Linnakangas <hlinnaka(at)iki(dot)fi> |
Cc: | Peter Smith <smithpb2250(at)gmail(dot)com>, Robert Haas <robertmhaas(at)gmail(dot)com>, vignesh C <vignesh21(at)gmail(dot)com>, Andres Freund <andres(at)anarazel(dot)de>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>, Melanie Plageman <melanieplageman(at)gmail(dot)com> |
Subject: | Re: Relation bulk write facility |
Date: | 2024-07-01 20:52:50 |
Message-ID: | 20240701205250.62.nmisch@google.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
On Fri, Feb 23, 2024 at 04:27:34PM +0200, Heikki Linnakangas wrote:
> Committed this. Thanks everyone!
Commit 8af2565 wrote:
> --- /dev/null
> +++ b/src/backend/storage/smgr/bulk_write.c
> +/*
> + * Finish bulk write operation.
> + *
> + * This WAL-logs and flushes any remaining pending writes to disk, and fsyncs
> + * the relation if needed.
> + */
> +void
> +smgr_bulk_finish(BulkWriteState *bulkstate)
> +{
> + /* WAL-log and flush any remaining pages */
> + smgr_bulk_flush(bulkstate);
> +
> + /*
> + * When we wrote out the pages, we passed skipFsync=true to avoid the
> + * overhead of registering all the writes with the checkpointer. Register
> + * the whole relation now.
> + *
> + * There is one hole in that idea: If a checkpoint occurred while we were
> + * writing the pages, it already missed fsyncing the pages we had written
> + * before the checkpoint started. A crash later on would replay the WAL
> + * starting from the checkpoint, therefore it wouldn't replay our earlier
> + * WAL records. So if a checkpoint started after the bulk write, fsync
> + * the files now.
> + */
> + if (!SmgrIsTemp(bulkstate->smgr))
> + {
Shouldn't this be "if (bulkstate->use_wal)"? The GetRedoRecPtr()-based
decision is irrelevant to the !wal case. Either we don't need fsync at all
(TEMP or UNLOGGED) or smgrDoPendingSyncs() will do it (wal_level=minimal). I
don't see any functional problem, but this likely arranges for an unnecessary
sync when a checkpoint starts between mdcreate() and here. (The mdcreate()
sync may also be unnecessary, but that's longstanding.)
> + /*
> + * Prevent a checkpoint from starting between the GetRedoRecPtr() and
> + * smgrregistersync() calls.
> + */
> + Assert((MyProc->delayChkptFlags & DELAY_CHKPT_START) == 0);
> + MyProc->delayChkptFlags |= DELAY_CHKPT_START;
> +
> + if (bulkstate->start_RedoRecPtr != GetRedoRecPtr())
> + {
> + /*
> + * A checkpoint occurred and it didn't know about our writes, so
> + * fsync() the relation ourselves.
> + */
> + MyProc->delayChkptFlags &= ~DELAY_CHKPT_START;
> + smgrimmedsync(bulkstate->smgr, bulkstate->forknum);
> + elog(DEBUG1, "flushed relation because a checkpoint occurred concurrently");
> + }
> + else
> + {
> + smgrregistersync(bulkstate->smgr, bulkstate->forknum);
> + MyProc->delayChkptFlags &= ~DELAY_CHKPT_START;
> + }
> + }
> +}
This is an elegant optimization.
From | Date | Subject | |
---|---|---|---|
Next Message | Heikki Linnakangas | 2024-07-01 21:53:05 | Re: Relation bulk write facility |
Previous Message | Tomas Vondra | 2024-07-01 20:20:41 | Re: WIP: parallel GiST index builds |