Re: pgsql: postgres_fdw: reestablish new connection if cached one is detect

From: Fujii Masao <masao(dot)fujii(at)oss(dot)nttdata(dot)com>
To: Michael Paquier <michael(at)paquier(dot)xyz>, Fujii Masao <fujii(at)postgresql(dot)org>
Cc: pgsql-committers(at)lists(dot)postgresql(dot)org
Subject: Re: pgsql: postgres_fdw: reestablish new connection if cached one is detect
Date: 2020-10-07 13:25:14
Message-ID: fb96a78e-54ad-832f-b2be-d589abdfd7df@oss.nttdata.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-committers

On 2020/10/07 12:54, Fujii Masao wrote:
>
>
> On 2020/10/07 11:13, Michael Paquier wrote:
>> Hi Fujii-san,
>>
>> On Tue, Oct 06, 2020 at 01:52:55AM +0000, Fujii Masao wrote:
>>> postgres_fdw: reestablish new connection if cached one is detected as broken.
>>>
>>> In postgres_fdw, once remote connections are established, they are cached
>>> and re-used for subsequent queries and transactions. There can be some
>>> cases where those cached connections are unavaiable, for example,
>>> by the restart of remote server. In these cases, previously an error was
>>> reported and the query accessing to remote server failed if new remote
>>> transaction failed to start because the cached connection was broken.
>>>
>>> This commit improves postgres_fdw so that new connection is remade
>>> if broken connection is detected when starting new remote transaction.
>>> This is useful to avoid unnecessary failure of queries when connection is
>>> broken but can be reestablished.
>>
>> lorikeet is telling that the test introduced by this commit is
>> unstable:
>> https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=lorikeet&dt=2020-10-06%2008%3A28%3A36
>
> Thanks for letting me know this!
>
>>
>> Some details:
>>   BEGIN;
>>   SELECT 1 FROM ft1 LIMIT 1;
>> - ?column?
>> -----------
>> -        1
>> -(1 row)
>> -
>> +ERROR:  could not receive data from server: Software caused connection abort
>> +CONTEXT:  remote SQL command: START TRANSACTION ISOLATION LEVEL REPEATABLE READ
>
> This error means that new connection was successfully reestablished
> after the cached connection was terminated, and then the above connection
> error occurred when issuing "START TRANSACTION" command on that
> new connection. There seems no suspicious relevant log messages in the
> logfile. So I'm not sure why this error happened, yet.
>
> Per the previous discusson at [1], lorikeet sometimes seems to cause
> connection-relation failure in the regression test. So the cause of error
> that we faced today also may be lorikeet itself.

Since it's not good to keep the buildfarm member red, I will revert
the commit unless I come up with something even after further
investigation.

My current just guess is that PQstatus(conn) doesn't indicate
CONNECTION_BAD when the above error occurs, and which
prevents new connection from being reestablished because of
the following check.

+ if (PQstatus(entry->conn) != CONNECTION_BAD ||
+ entry->xact_depth > 0 ||
+ retry_conn)
+ PG_RE_THROW();

Regards,

--
Fujii Masao
Advanced Computing Technology Center
Research and Development Headquarters
NTT DATA CORPORATION

In response to

Responses

Browse pgsql-committers by date

  From Date Subject
Next Message Fujii Masao 2020-10-07 15:48:06 Re: pgsql: postgres_fdw: reestablish new connection if cached one is detect
Previous Message Fujii Masao 2020-10-07 03:54:13 Re: pgsql: postgres_fdw: reestablish new connection if cached one is detect