From: | Amul Sul <sulamul(at)gmail(dot)com> |
---|---|
To: | Robert Haas <robertmhaas(at)gmail(dot)com> |
Cc: | Andres Freund <andres(at)anarazel(dot)de>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org> |
Subject: | Re: [Patch] ALTER SYSTEM READ ONLY |
Date: | 2020-10-28 11:43:38 |
Message-ID: | CAAJ_b97pC5MRwG42c9StkZSNYXh9Lq8YittY8NZLcgUcB1LfjA@mail.gmail.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
On Thu, Oct 8, 2020 at 3:52 PM Amul Sul <sulamul(at)gmail(dot)com> wrote:
>
> On Wed, Oct 7, 2020 at 11:19 PM Robert Haas <robertmhaas(at)gmail(dot)com> wrote:
> >
> > On Wed, Sep 16, 2020 at 3:33 PM Robert Haas <robertmhaas(at)gmail(dot)com> wrote:
> > > I don't mind a loop, but that one looks broken. We have to clear the
> > > bit before we call the function that processes that type of barrier.
> > > Otherwise, if we succeed in absorbing the barrier but a new instance
> > > of the same barrier arrives meanwhile, we'll fail to realize that we
> > > need to absorb the new one.
> >
> > Here's a new version of the patch for allowing errors in
> > barrier-handling functions and/or rejection of barriers by those
> > functions. I think this responds to all of the previous review
> > comments from Andres. Also, here is an 0002 which is a handy bit of
> > test code that I wrote. It's not for commit, but it is useful for
> > finding bugs.
> >
> > In addition to improving 0001 based on the review comments, I also
> > tried to write a better commit message for it, but it might still be
> > possible to do better there. It's a bit hard to explain the idea in
> > the abstract. For ALTER SYSTEM READ ONLY, the idea is that a process
> > with an XID -- and possibly a bunch of sub-XIDs, and possibly while
> > idle-in-transaction -- can elect to FATAL rather than absorbing the
> > barrier. I suspect for other barrier types we might have certain
> > (hopefully short) stretches of code where a barrier of a particular
> > type can't be absorbed because we're in the middle of doing something
> > that relies on the previous value of whatever state is protected by
> > the barrier. Holding off interrupts in those stretches of code would
> > prevent the barrier from being absorbed, but would also prevent query
> > cancel, backend termination, and absorption of other barrier types, so
> > it seems possible that just allowing the barrier-absorption function
> > for a barrier of that type to just refuse the barrier until after the
> > backend exits the critical section of code will work out better.
> >
> > Just for kicks, I tried running 'make installcheck-parallel' while
> > emitting placeholder barriers every 0.05 s after altering the
> > barrier-absorption function to always return false, just to see how
> > ugly that was. In round figures, it made it take 24 s vs. 21 s, so
> > it's actually not that bad. However, it all depends on how many times
> > you hit CHECK_FOR_INTERRUPTS() how quickly, so it's easy to imagine
> > that the effect might be very non-uniform. That is, if you can get the
> > code to be running a tight loop that does little real work but does
> > CHECK_FOR_INTERRUPTS() while refusing to absorb outstanding type of
> > barrier, it will probably suck. Therefore, I'm inclined to think that
> > the fairly strong cautionary logic in the patch is reasonable, but
> > perhaps it can be better worded somehow. Thoughts welcome.
> >
> > I have not rebased the remainder of the patch series over these two.
> >
> That I'll do.
>
Attaching a rebased version includes Robert's patches for the latest master
head.
> On a quick look at the latest 0001 patch, the following hunk to reset leftover
> flags seems to be unnecessary:
>
> + /*
> + * If some barrier types were not successfully absorbed, we will have
> + * to try again later.
> + */
> + if (!success)
> + {
> + ResetProcSignalBarrierBits(flags);
> + return;
> + }
>
> When the ProcessBarrierPlaceholder() function returns false without an error,
> that barrier flag gets reset within the while loop. The case when it has an
> error, the rest of the flags get reset in the catch block. Correct me if I am
> missing something here.
>
Robert, could you please confirm this?
Regards,
Amul
Attachment | Content-Type | Size |
---|---|---|
v10-0001-Allow-for-error-or-refusal-while-absorbing-barri.patch | application/x-patch | 7.5 KB |
v10-0005-Error-or-Assert-before-START_CRIT_SECTION-for-WA.patch | application/x-patch | 70.1 KB |
v10-0002-Test-module-for-barriers.-NOT-FOR-COMMIT.patch | application/x-patch | 3.9 KB |
v10-0003-Add-alter-system-read-only-write-syntax.patch | application/x-patch | 9.3 KB |
v10-0004-Implement-ALTER-SYSTEM-READ-ONLY-using-global-ba.patch | application/x-patch | 41.8 KB |
v10-0006-WIP-Documentation.patch | application/x-patch | 6.5 KB |
From | Date | Subject | |
---|---|---|---|
Next Message | Bharath Rupireddy | 2020-10-28 11:50:46 | Re: A new function to wait for the backend exit after termination |
Previous Message | Pavel Borisov | 2020-10-28 11:37:53 | Valgrind run error with Postgres on OSX |