From: | Nathan Bossart <nathandbossart(at)gmail(dot)com> |
---|---|
To: | Andres Freund <andres(at)anarazel(dot)de> |
Cc: | Thomas Munro <thomas(dot)munro(at)gmail(dot)com>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org> |
Subject: | Re: USE_BARRIER_SMGRRELEASE on Linux? |
Date: | 2022-02-16 17:37:21 |
Message-ID: | 20220216173721.GA3007497@nathanxps13 |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
On Wed, Feb 16, 2022 at 08:44:42AM -0800, Nathan Bossart wrote:
> On Tue, Feb 15, 2022 at 10:57:32PM -0800, Nathan Bossart wrote:
>> On Tue, Feb 15, 2022 at 10:14:04PM -0800, Nathan Bossart wrote:
>>> It looks like register_unlink_segment() is called prior to the checkpoint,
>>> but the checkpointer is not calling RememberSyncRequest() until after
>>> SyncPreCheckpoint(). This means that the requests are registered with the
>>> next checkpoint cycle count, so they aren't processed until the next
>>> checkpoint.
>>
>> Calling AbsorbSyncRequests() before advancing the checkpoint cycle counter
>> seems to fix the issue. However, this requires moving SyncPreCheckpoint()
>> out of the critical section in CreateCheckPoint(). Patch attached.
>
> An alternative fix might be to call AbsorbSyncRequests() after increasing
> the ckpt_started counter in CheckpointerMain(). AFAICT there is a window
> just before checkpointing where new requests are registered for the
> checkpoint following the one about to begin.
Here's a patch that adds a call to AbsorbSyncRequests() in
CheckpointerMain() instead of SyncPreCheckpoint(). I've also figured out a
way to reproduce the issue without the pre-allocation patches applied:
1. In checkpointer.c, add a 30 second sleep before acquiring ckpt_lck to
increment ckpt_started.
2. In session 1, run the following commands:
a. CREATE TABLESPACE test LOCATION '/path/to/dir';
b. CREATE TABLE test TABLESPACE test AS SELECT 1;
3. In session 2, start a checkpoint.
4. In session 1, run these commands:
a. ALTER TABLE test SET TABLESPACE pg_default;
b. DROP TABLESPACE test; -- fails
c. DROP TABLESPACE test; -- succeeds
With the attached patch applied, the first attempt at dropping the
tablespace no longer fails.
--
Nathan Bossart
Amazon Web Services: https://aws.amazon.com
Attachment | Content-Type | Size |
---|---|---|
v2-0001-call-AbsorbSyncRequests-after-indicating-checkpoi.patch | text/x-diff | 1.4 KB |
From | Date | Subject | |
---|---|---|---|
Next Message | Jeevan Ladhe | 2022-02-16 17:45:56 | Re: refactoring basebackup.c |
Previous Message | Andres Freund | 2022-02-16 17:26:25 | Re: Race conditions in 019_replslot_limit.pl |