From: | Amul Sul <sulamul(at)gmail(dot)com> |
---|---|
To: | Andres Freund <andres(at)anarazel(dot)de> |
Cc: | Robert Haas <robertmhaas(at)gmail(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org> |
Subject: | Re: [Patch] ALTER SYSTEM READ ONLY |
Date: | 2021-01-14 11:29:14 |
Message-ID: | CAAJ_b96WBxFcnAWLjweEpX1ZPOtX8oxgc82EBGGjfNSxNoz+Bg@mail.gmail.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
On Mon, Dec 14, 2020 at 8:03 PM Amul Sul <sulamul(at)gmail(dot)com> wrote:
>
> On Mon, Dec 14, 2020 at 11:28 AM Amul Sul <sulamul(at)gmail(dot)com> wrote:
> >
> > On Thu, Dec 10, 2020 at 6:04 AM Andres Freund <andres(at)anarazel(dot)de> wrote:
> > >
> > > Hi,
> > >
> > > On 2020-12-09 16:13:06 -0500, Robert Haas wrote:
> > > > That's not good. On a typical busy system, a system is going to be in
> > > > the middle of a checkpoint most of the time, and the checkpoint will
> > > > take a long time to finish - maybe minutes.
> > >
> > > Or hours, even. Due to the cost of FPWs it can make a lot of sense to
> > > reduce the frequency of that cost...
> > >
> > >
> > > > We want this feature to respond within milliseconds or a few seconds,
> > > > not minutes. So we need something better here.
> > >
> > > Indeed.
> > >
> > >
> > > > I'm inclined to think
> > > > that we should try to CompleteWALProhibitChange() at the same places
> > > > we AbsorbSyncRequests(). We know from experience that bad things
> > > > happen if we fail to absorb sync requests in a timely fashion, so we
> > > > probably have enough calls to AbsorbSyncRequests() to make sure that
> > > > we always do that work in a timely fashion. So, if we do this work in
> > > > the same place, then it will also be done in a timely fashion.
> > >
> > > Sounds sane, without having looked in detail.
> > >
> >
> > Understood & agreed that we need to change the system state as soon as possible.
> >
> > I can see AbsorbSyncRequests() is called from 4 routing as
> > CheckpointWriteDelay(), ProcessSyncRequests(), SyncPostCheckpoint() and
> > CheckpointerMain(). Out of 4, the first three executes with an interrupt is on
> > hod which will cause a problem when we do emit barrier and wait for those
> > barriers absorption by all the process including itself and will cause an
> > infinite wait. I think that can be fixed by teaching WaitForProcSignalBarrier(),
> > do not wait on self to absorb barrier. Let that get absorbed at a later point
> > in time when the interrupt is resumed. I assumed that we cannot do barrier
> > processing right away since there could be other barriers (maybe in the future)
> > including ours that should not process while the interrupt is on hold.
> >
>
> CreateCheckPoint() holds CheckpointLock LW at start and releases at the end
> which puts interrupt on hold. This kinda surprising that we were holding this
> lock and putting interrupt on hots for a long time. We do need that
> CheckpointLock just to ensure that one checkpoint happens at a time. Can't we do
> something easy to ensure that instead of the lock? Probably holding off
> interrupts for so long doesn't seem to be a good idea. Thoughts/Suggestions?
>
To move development, testing, and the review forward, I have commented out the
code acquiring CheckpointLock from CreateCheckPoint() in the 0003 patch and
added the changes for the checkpointer so that system read-write state change
request can be processed as soon as possible, as suggested by Robert[1].
I have started a new thread[2] to understand the need for the CheckpointLock in
CreateCheckPoint() function. Until then we can continue work on this feature by
skipping CheckpointLock in CreateCheckPoint(), and therefore the 0003 patch is
marked WIP.
1] http://postgr.es/m/CA+TgmoYexwDQjdd1=15KMz+7VfHVx8VHNL2qjRRK92P=CSZDxg@mail.gmail.com
2] http://postgr.es/m/CAAJ_b97XnBBfYeSREDJorFsyoD1sHgqnNuCi=02mNQBUMnA=FA@mail.gmail.com
Regards,
Amul
Attachment | Content-Type | Size |
---|---|---|
v12-0001-Allow-for-error-or-refusal-while-absorbing-barri.patch | application/x-patch | 7.4 KB |
v12-0002-Test-module-for-barriers.-NOT-FOR-COMMIT.patch | application/x-patch | 3.9 KB |
v12-0004-WIP-Implement-ALTER-SYSTEM-READ-ONLY-using-globa.patch | application/x-patch | 46.5 KB |
v12-0005-Error-or-Assert-before-START_CRIT_SECTION-for-WA.patch | application/x-patch | 69.3 KB |
v12-0003-Add-alter-system-read-only-write-syntax.patch | application/x-patch | 9.3 KB |
v12-0006-WIP-Documentation.patch | application/x-patch | 6.5 KB |
From | Date | Subject | |
---|---|---|---|
Next Message | Bharath Rupireddy | 2021-01-14 11:36:07 | Re: [PATCH] postgres_fdw connection caching - cause remote sessions linger till the local session exit |
Previous Message | Josef Šimánek | 2021-01-14 10:31:33 | Re: proposal: schema variables |