From: | Dave Peticolas <dave(at)krondo(dot)com> |
---|---|
To: | Michael Paquier <michael(at)paquier(dot)xyz> |
Cc: | Alexander Kukushkin <cyberdemn(at)gmail(dot)com>, pgsql-general <pgsql-general(at)postgresql(dot)org> |
Subject: | Re: WAL replay issue from 9.6.8 to 9.6.10 |
Date: | 2018-08-30 03:19:07 |
Message-ID: | CAPRbp06vV51ZgDFYpWjv0pwN9oMwRh_SR=RBVhvDeWiwViH2vA@mail.gmail.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-general |
On Wed, Aug 29, 2018 at 1:50 PM Michael Paquier <michael(at)paquier(dot)xyz> wrote:
> On Wed, Aug 29, 2018 at 09:15:29AM -0700, Dave Peticolas wrote:
> > Oh, perhaps I do, depending on what you mean by worker. There are a
> couple
> > of periodic processes that connect to the server to obtain metrics. Is
> that
> > what is triggering this issue? In my case I could probably suspend them
> > until the replay has reached the desired point.
>
> That would be it. How do you decide when those begin to run and connect
> to Postgres. Do you use pg_isready or similar in a loop for sanity
> checks?
>
I do not, they just try to connect and bail if they cannot.
> > I have noticed this behavior in the past but prior to 9.6.10 restarting
> the
> > server would fix the issue. And the replay always seemed to reach a point
> > past which the problem would not re-occur.
>
> You are picking my interest here. Did you actually see the same
> problem? In 9.6.10 what happens is that I have tightened the consistent
> point checks and logic so as inconsistent page issues would actually
> show up when they should, and that those become reproducible so as we
> can track down any rogue WAL record or inconsistent behavior.
>
Yes, I've seen this problem occasionally in the past. I think only in the
9.6 series. But before 9.6.10, if I restarted the server it would start
replaying WAL again and typically when it reached the point where it
PANICed before, instead it would report a consistent state and allow
read-only connections. Sometimes it would then PANIC again after more WAL
was replayed. But eventually it would reach a point where it seemed to be
able to replay WAL indefinitely without the issue happening.
dave
From | Date | Subject | |
---|---|---|---|
Next Message | saurabh shelar | 2018-08-30 07:37:29 | Re: Issue with psqlrc with command line. |
Previous Message | Michael Paquier | 2018-08-29 20:50:03 | Re: WAL replay issue from 9.6.8 to 9.6.10 |