From: | Amul Sul <sulamul(at)gmail(dot)com> |
---|---|
To: | Andres Freund <andres(at)anarazel(dot)de> |
Cc: | Robert Haas <robertmhaas(at)gmail(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org> |
Subject: | Re: [Patch] ALTER SYSTEM READ ONLY |
Date: | 2020-12-14 14:33:32 |
Message-ID: | CAAJ_b96Fia+XQHvkpDcLGvhJ5iK0dZZnPZFoi69REWb2w6ga+w@mail.gmail.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
On Mon, Dec 14, 2020 at 11:28 AM Amul Sul <sulamul(at)gmail(dot)com> wrote:
>
> On Thu, Dec 10, 2020 at 6:04 AM Andres Freund <andres(at)anarazel(dot)de> wrote:
> >
> > Hi,
> >
> > On 2020-12-09 16:13:06 -0500, Robert Haas wrote:
> > > That's not good. On a typical busy system, a system is going to be in
> > > the middle of a checkpoint most of the time, and the checkpoint will
> > > take a long time to finish - maybe minutes.
> >
> > Or hours, even. Due to the cost of FPWs it can make a lot of sense to
> > reduce the frequency of that cost...
> >
> >
> > > We want this feature to respond within milliseconds or a few seconds,
> > > not minutes. So we need something better here.
> >
> > Indeed.
> >
> >
> > > I'm inclined to think
> > > that we should try to CompleteWALProhibitChange() at the same places
> > > we AbsorbSyncRequests(). We know from experience that bad things
> > > happen if we fail to absorb sync requests in a timely fashion, so we
> > > probably have enough calls to AbsorbSyncRequests() to make sure that
> > > we always do that work in a timely fashion. So, if we do this work in
> > > the same place, then it will also be done in a timely fashion.
> >
> > Sounds sane, without having looked in detail.
> >
>
> Understood & agreed that we need to change the system state as soon as possible.
>
> I can see AbsorbSyncRequests() is called from 4 routing as
> CheckpointWriteDelay(), ProcessSyncRequests(), SyncPostCheckpoint() and
> CheckpointerMain(). Out of 4, the first three executes with an interrupt is on
> hod which will cause a problem when we do emit barrier and wait for those
> barriers absorption by all the process including itself and will cause an
> infinite wait. I think that can be fixed by teaching WaitForProcSignalBarrier(),
> do not wait on self to absorb barrier. Let that get absorbed at a later point
> in time when the interrupt is resumed. I assumed that we cannot do barrier
> processing right away since there could be other barriers (maybe in the future)
> including ours that should not process while the interrupt is on hold.
>
CreateCheckPoint() holds CheckpointLock LW at start and releases at the end
which puts interrupt on hold. This kinda surprising that we were holding this
lock and putting interrupt on hots for a long time. We do need that
CheckpointLock just to ensure that one checkpoint happens at a time. Can't we do
something easy to ensure that instead of the lock? Probably holding off
interrupts for so long doesn't seem to be a good idea. Thoughts/Suggestions?
Regards,
Amul
From | Date | Subject | |
---|---|---|---|
Next Message | Fujii Masao | 2020-12-14 14:33:53 | Re: [PATCH] postgres_fdw connection caching - cause remote sessions linger till the local session exit |
Previous Message | Ashutosh Bapat | 2020-12-14 14:09:36 | Re: Insert Documentation - Returning Clause and Order |