Re: [PATCH] Fix Proposal - Deadlock Issue in Single User Mode When IO Failure Occurs

From: Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>
To: Chengchao Yu <chengyu(at)microsoft(dot)com>
Cc: Thomas Munro <thomas(dot)munro(at)enterprisedb(dot)com>, Pg Hackers <pgsql-hackers(at)postgresql(dot)org>, Prabhat Tripathi <ptrip(at)microsoft(dot)com>, Sunil Kamath <Sunil(dot)Kamath(at)microsoft(dot)com>, Michal Primke <mprimke(at)microsoft(dot)com>, TEJA Mupparti <Tejeswar(dot)Mupparti(at)microsoft(dot)com>
Subject: Re: [PATCH] Fix Proposal - Deadlock Issue in Single User Mode When IO Failure Occurs
Date: 2019-02-02 02:57:41
Message-ID: CAA4eK1+QmK_n2VkT-U-xDorZzJaUNyfdRze0H1-aJ2jE0MpW1A@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Sat, Feb 2, 2019 at 4:42 AM Chengchao Yu <chengyu(at)microsoft(dot)com> wrote:
>
> Hi Amit, Thomas,
>
> Thank you very much for your feedbacks! Apologizes but I just saw both messages.
>
> > We generally reserve the space in a relation before attempting to write, so not sure how you are able to hit the disk full situation via mdwrite. If you see the description of the function, that also indicates same.
>
> Absolutely agree, this isn’t a PG issue. Issue manifest for us at Microsoft due to our own storage layer which treat mdextend() actions as setting length of the file only. We have a workaround, and any change isn’t needed for Postgres.
>
> > I am not telling that mdwrite can never lead to error, but just trying to understand the issue you actually faced. I haven't read your proposed solution yet, let's first try to establish the problem you are facing.
>
> We see transient IO errors reading a block where PG instance gets dead-lock in single user mode until we kill the instance. The stack trace below shows the behavior which is when mdread() failed with buffer holding its lw-lock. This happens because in single user mode there is no call back to release the lock and when AbortBufferIO() tries to acquire the same lock again, it will wait for the lock indefinitely.
>

I think you can register your patch for next CF [1] so that we don't
forget about it.

[1] - https://commitfest.postgresql.org/22/

--
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Amit Kapila 2019-02-02 03:06:01 Re: [HACKERS] Block level parallel vacuum
Previous Message Alvaro Herrera 2019-02-02 02:31:51 Re: [Patch] Log10 and hyperbolic functions for SQL:2016 compliance