From: | Fujii Masao <masao(dot)fujii(at)oss(dot)nttdata(dot)com> |
---|---|
To: | Michael Paquier <michael(at)paquier(dot)xyz>, Fujii Masao <fujii(at)postgresql(dot)org> |
Cc: | pgsql-committers(at)lists(dot)postgresql(dot)org |
Subject: | Re: pgsql: postgres_fdw: reestablish new connection if cached one is detect |
Date: | 2020-10-08 03:43:17 |
Message-ID: | eeec0120-9ca1-bd20-a00b-5fc5f2c862a1@oss.nttdata.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-committers |
On 2020/10/08 0:48, Fujii Masao wrote:
>
>
> On 2020/10/07 22:25, Fujii Masao wrote:
>>
>>
>> On 2020/10/07 12:54, Fujii Masao wrote:
>>>
>>>
>>> On 2020/10/07 11:13, Michael Paquier wrote:
>>>> Hi Fujii-san,
>>>>
>>>> On Tue, Oct 06, 2020 at 01:52:55AM +0000, Fujii Masao wrote:
>>>>> postgres_fdw: reestablish new connection if cached one is detected as broken.
>>>>>
>>>>> In postgres_fdw, once remote connections are established, they are cached
>>>>> and re-used for subsequent queries and transactions. There can be some
>>>>> cases where those cached connections are unavaiable, for example,
>>>>> by the restart of remote server. In these cases, previously an error was
>>>>> reported and the query accessing to remote server failed if new remote
>>>>> transaction failed to start because the cached connection was broken.
>>>>>
>>>>> This commit improves postgres_fdw so that new connection is remade
>>>>> if broken connection is detected when starting new remote transaction.
>>>>> This is useful to avoid unnecessary failure of queries when connection is
>>>>> broken but can be reestablished.
>>>>
>>>> lorikeet is telling that the test introduced by this commit is
>>>> unstable:
>>>> https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=lorikeet&dt=2020-10-06%2008%3A28%3A36
>>>
>>> Thanks for letting me know this!
>>>
>>>>
>>>> Some details:
>>>> BEGIN;
>>>> SELECT 1 FROM ft1 LIMIT 1;
>>>> - ?column?
>>>> -----------
>>>> - 1
>>>> -(1 row)
>>>> -
>>>> +ERROR: could not receive data from server: Software caused connection abort
>>>> +CONTEXT: remote SQL command: START TRANSACTION ISOLATION LEVEL REPEATABLE READ
>>>
>>> This error means that new connection was successfully reestablished
>>> after the cached connection was terminated, and then the above connection
>>> error occurred when issuing "START TRANSACTION" command on that
>>> new connection. There seems no suspicious relevant log messages in the
>>> logfile. So I'm not sure why this error happened, yet.
>>>
>>> Per the previous discusson at [1], lorikeet sometimes seems to cause
>>> connection-relation failure in the regression test. So the cause of error
>>> that we faced today also may be lorikeet itself.
>>
>> Since it's not good to keep the buildfarm member red, I will revert
>> the commit unless I come up with something even after further
>> investigation.
>>
>> My current just guess is that PQstatus(conn) doesn't indicate
>> CONNECTION_BAD when the above error occurs, and which
>> prevents new connection from being reestablished because of
>> the following check.
>>
>> + if (PQstatus(entry->conn) != CONNECTION_BAD ||
>> + entry->xact_depth > 0 ||
>> + retry_conn)
>> + PG_RE_THROW();
>
> The error message in discussion is reported when recv() fails and
> errno=ECONNABORTED. As far as I read the code, pqReadData() marks
> the connection as CONNECTION_BAD when errno=ECONNRESET,
> but not when errno=ECONNABORTED. So since PQstatus(entry->conn)
> doesn't indicate CONNECTION_BAD in ECONNABORTED case,
> the above check is passed through, an error is re-thrown and
> new connection is not reestablished.
>
> Therefore, the easy fix is to make libpq mark the connection as
> CONNECTION_BAD even in ECONNABORTED, like we do in ECONNRESET.
Patch attached. This patch also changes errcode_for_socket_access()
so that it uses ERRCODE_CONNECTION_FAILURE rather than
ERRCODE_INTERNAL_ERROR as sqlerrorcode in ECONNABORTED case
like ECONNRESET. Is this sane?
Regards,
--
Fujii Masao
Advanced Computing Technology Center
Research and Development Headquarters
NTT DATA CORPORATION
Attachment | Content-Type | Size |
---|---|---|
econnaborted_as_conn_error.patch | text/plain | 1.6 KB |
From | Date | Subject | |
---|---|---|---|
Next Message | Amit Kapila | 2020-10-08 03:47:19 | pgsql: Track statistics for spilling of changes from ReorderBuffer. |
Previous Message | Tom Lane | 2020-10-07 22:43:22 | pgsql: Fix optimization hazard in gram.y's makeOrderedSetArgs(), redux. |