From: | Michael Paquier <michael(dot)paquier(at)gmail(dot)com> |
---|---|
To: | Amit Kapila <amit(dot)kapila16(at)gmail(dot)com> |
Cc: | Heikki Linnakangas <hlinnaka(at)iki(dot)fi>, Oleksandr Shulgin <oleksandr(dot)shulgin(at)zalando(dot)de>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Pg Bugs <pgsql-bugs(at)postgresql(dot)org> |
Subject: | Re: PQexec() hangs on OOM |
Date: | 2015-09-04 07:25:35 |
Message-ID: | CAB7nPqRWbL15EFVQdEzcgR4aUrTCtkCJKvUoRkQysgbCjEezKQ@mail.gmail.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-bugs |
On Fri, Sep 4, 2015 at 2:07 PM, Amit Kapila <amit(dot)kapila16(at)gmail(dot)com> wrote:
>
> On Mon, Aug 31, 2015 at 12:55 PM, Michael Paquier <
> michael(dot)paquier(at)gmail(dot)com> wrote:
>>
>> On Sat, Aug 29, 2015 at 10:35 PM, Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>
>> wrote:
>>>
>>>
>>> One thing that looks slightly non-elegant about this approach is that
>>> new async status (PGASYNC_FATAL) is used to deal with errors only
>>> in few paths in function pqParseInput3() and not-other paths which should
>>> be okay if there is no other better way.
>>>
>>
>> I considered using this logic in more code paths, but I kept focused on
>> having a not-too-invasive back-patchable patch as the refactoring that
>> would occur may be too heavy for what is usually pushed to the stable
>> branches. Do you think it would be better to get a cleaner refactoring
>> patch first and globalize the approach with PGASYNC_FATAL?
>>
>
> No wait, lets first try to see if this the best fix for Copy path.
>
Sure. Thanks. (We could remove some dead code of libpq though).
> The problem with the current approach is that even if we are able to
> receive error, it doesn't completely clear the previous command
> execution. As an example, if using debugger, I make getCopyStart()
> return OOM error, then the next command execution fails.
>
> postgres=# copy t1 from stdin;
> out of memory
> postgres=# copy t1 from stdin;
> another command is already in progress
>
Oops. Indeed. Using this new flag is obviously not a good idea because the
connection remains in a state that it considers as PGASYNC_FATAL and
actually it may not even finish to consume the remaining messages. Perhaps
I was too sleepy last time I tested that, well...
> It was actually a far cleaner approach to have a failure flag to decide if
>> based on the async state process should move on to a failure code path or
>> not.
>>
>
> I think you have already tried, but it seems to me that if we can handle
> it based on result status, that would be better, rather than introducing
> new state, but anyway lets first try to sort out above reported problem.
>
So, looking at that again with a fresh eye, I noticed that
getCopyDataMessage has a similar error handling when it fails on
pqCheckInBufferSpace. So looking at what I added, it seems that I am trying
to duplicate the protocol error handling even if everything is already
present, so that's really overdoing it. Hence, let's simply drop this idea
of new flag PGASYNC_FATAL and instead treat the OOM as a sync handling
error with the server, like in the patch attached.
When emulating an OOM with this patch, I am getting this error instead of
the infinite loop, which looks acceptable to me:
=# copy aa to stdout;
out of memory
lost synchronization with server: got message type "H", length 5
The connection to the server was lost. Attempting reset: Succeeded.
The extra message handling I have added at the end of getCopyStart and
getParamDescriptions still looks more adapted to me when a failure happens,
so I kept it.
Regards,
--
Michael
Attachment | Content-Type | Size |
---|---|---|
20150904_libpq_oom_v2.patch | application/x-patch | 7.7 KB |
From | Date | Subject | |
---|---|---|---|
Next Message | Teodor Sigaev | 2015-09-04 08:13:47 | Re: BUG #13440: unaccent does not remove all diacritics |
Previous Message | Amit Kapila | 2015-09-04 05:07:07 | Re: PQexec() hangs on OOM |