From: | Fujii Masao <masao(dot)fujii(at)gmail(dot)com> |
---|---|
To: | Michael Paquier <michael(at)paquier(dot)xyz> |
Cc: | Tomas Vondra <tomas(dot)vondra(at)2ndquadrant(dot)com>, Thunder <thunder1(at)126(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org> |
Subject: | Re: PATCH: standby crashed when replay block which truncated in standby but failed to truncate in master node |
Date: | 2019-10-03 04:49:34 |
Message-ID: | CAHGQGwHCK6f77yeZD4MHOnN+PaTf6XiJfEB+Ce7SksSHjeAWtg@mail.gmail.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
On Fri, Sep 27, 2019 at 3:14 PM Michael Paquier <michael(at)paquier(dot)xyz> wrote:
>
> On Thu, Sep 26, 2019 at 01:13:56AM +0900, Fujii Masao wrote:
> > On Tue, Sep 24, 2019 at 10:41 AM Michael Paquier <michael(at)paquier(dot)xyz> wrote:
> >> This also points out that there are other things to worry about than
> >> interruptions, as for example DropRelFileNodeLocalBuffers() could lead
> >> to an ERROR, and this happens before the physical truncation is done
> >> but after the WAL record is replayed on the standby, so any failures
> >> happening at the truncation phase before the work is done would be a
> >> problem. However we are talking about failures which should not
> >> happen and these are elog() calls. It would be tempting to add a
> >> critical section here, but we could still have problems if we have a
> >> failure after the WAL record has been flushed, which means that it
> >> would be replayed on the standby, and the surrounding comments are
> >> clear about that.
> >
> > Could you elaborate what problem adding a critical section there occurs?
>
> Wrapping the call of smgrtruncate() within RelationTruncate() to use a
> critical section would make things worse from the user perspective on
> the primary, no? If the physical truncation fails, we would still
> fail WAL replay on the standby, but instead of generating an ERROR in
> the session of the user attempting the TRUNCATE, the whole primary
> would be taken down.
Thanks for elaborating that! Understood.
But this can cause subsequent recovery to always fail with invalid-pages error
and the server not to start up. This is bad. So, to allviate the situation,
I'm thinking it would be worth adding something like igore_invalid_pages
developer parameter. When this parameter is set to true, the startup process
always ignores invalid-pages errors. Thought?
Regards,
--
Fujii Masao
From | Date | Subject | |
---|---|---|---|
Next Message | Michael Paquier | 2019-10-03 04:57:50 | Re: PATCH: standby crashed when replay block which truncated in standby but failed to truncate in master node |
Previous Message | Andres Freund | 2019-10-03 04:35:43 | Re: Hooks for session start and end, take two |