From: | Chris Travers <chris(dot)travers(at)adjust(dot)com> |
---|---|
To: | Thomas Munro <thomas(dot)munro(at)enterprisedb(dot)com> |
Cc: | Andres Freund <andres(at)anarazel(dot)de>, pgsql-hackers(at)lists(dot)postgresql(dot)org |
Subject: | Re: Funny hang on PostgreSQL 10 during parallel index scan on slave |
Date: | 2018-09-06 11:44:00 |
Message-ID: | CAN-RpxD07n2ktWL=B-RvYfWG2Q5+fXUeqJkNWf5p0yG2m1p9Nw@mail.gmail.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
On Thu, Sep 6, 2018 at 1:31 PM Chris Travers <chris(dot)travers(at)adjust(dot)com>
wrote:
>
>
> On Thu, Sep 6, 2018 at 11:08 AM Chris Travers <chris(dot)travers(at)adjust(dot)com>
> wrote:
>
>> Ok, so here's my current patch (work in progress). I have not run tests
>> yet, and finding a way to add a test case is virtually impossible though I
>> expect we will find ways of using gdb to confirm local fix later.
>>
>> After careful code review, I settled on the following approach which
>> seemed to be the least intrusive.
>>
>> 1. Change the retry logic so that it does not retry of
>> QueryCancelPending or ProcDiePending are set. In these cases EINTR is
>> passed back to the caller
>> 2. Change the error handling logic of the caller so that EINTR is
>> handled by the next CHECK_FOR_INTERRUPTS() after cleanup.
>>
>> This means that the file descriptor is now already closed and that we
>> handle this the same way we would with a ENOSPC. The reason for this is
>> that there are many places in the code where it is not clear what degree of
>> safety is present for throwing errors using ereport, and the patch needs to
>> be as unobtrusive as possible to facilitate back porting.
>>
>> At this point the patch is for discussion. I have not even compiled it
>> or tested it but maybe there is something wrong with my approach so figured
>> I would send it out early.
>>
>> The major assumptions are:
>> 1. close() will never take longer than the interrupt interval that
>> caused the loop to break.
>> 2. close() does not get interrupted in a way that will not cause cleanup
>> problems later.
>> 3. We will reach the next interrupt check at a reasonable interval and
>> safe spot.
>>
>> Any initial arguments against?
>>
>
> The previous patch had two issues found on internal code review here. I
> am sending a new patch.
>
> This patch compiles. make check passes
> make check-world fails but the errors are the same as found on master and
> involve ecpg.
>
> We will be doing further internal review and verification and then will
> run through pg_indent before final submission.
>
> Changes in this patch:
> 1. Previous patch had backwards check for EINTR in calling function.
> This was fixed.
> 2. In cases where ERROR elevel was passed in, function would return
> instead of error out on case of error.
>
> So in this case we check if we expect to error on error and if so, check
> for interrupts. If not, we go through the normal error handling/logging
> path which might result in an warning that shared memory segment could not
> be allocated followed by an actual meaningful error. I could put that in
> an else if, if that is seen as a good idea, but figured I would raise it as
> a discussion point.
>
Attaching correct patch....
>
>
>>
>> --
>> Best Regards,
>> Chris Travers
>> Head of Database
>>
>> Tel: +49 162 9037 210 | Skype: einhverfr | www.adjust.com
>> Saarbrücker Straße 37a, 10405 Berlin
>>
>>
>
> --
> Best Regards,
> Chris Travers
> Head of Database
>
> Tel: +49 162 9037 210 | Skype: einhverfr | www.adjust.com
> Saarbrücker Straße 37a, 10405 Berlin
>
>
--
Best Regards,
Chris Travers
Head of Database
Tel: +49 162 9037 210 | Skype: einhverfr | www.adjust.com
Saarbrücker Straße 37a, 10405 Berlin
Attachment | Content-Type | Size |
---|---|---|
racecondition.patch | application/octet-stream | 2.8 KB |
From | Date | Subject | |
---|---|---|---|
Next Message | Aleksandr Parfenov | 2018-09-06 11:51:09 | Re: Optimze usage of immutable functions as relation |
Previous Message | Alexander Korotkov | 2018-09-06 11:40:41 | Re: [HACKERS] Bug in to_timestamp(). |