Re: Issue in postgres_fdw causing unnecessary wait for cancel request reply

From: Fujii Masao <masao(dot)fujii(at)oss(dot)nttdata(dot)com>
To: Kyotaro Horiguchi <horikyota(dot)ntt(at)gmail(dot)com>
Cc: pgsql-hackers(at)lists(dot)postgresql(dot)org
Subject: Re: Issue in postgres_fdw causing unnecessary wait for cancel request reply
Date: 2023-04-12 14:39:29
Message-ID: 7220887c-e2e3-e79f-38c4-7d9124688ab4@oss.nttdata.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On 2023/04/12 12:00, Kyotaro Horiguchi wrote:
> At Wed, 12 Apr 2023 03:36:01 +0900, Fujii Masao <masao(dot)fujii(at)oss(dot)nttdata(dot)com> wrote in
>> Attached patch fixes this issue. It ensures that postgres_fdw only
>> waits
>> for a reply if a cancel request is actually issued. Additionally,
>> it improves PQgetCancel() to set error messages in certain error
>> cases,
>> such as when out of memory occurs and malloc() fails. Moreover,
>> it enhances postgres_fdw to report a warning message when
>> PQgetCancel()
>> returns NULL, explaining the reason for the NULL value.
>>
>> Thought?
>
> I wondered why the connection didn't fail in the first place. After
> digging into it, I found (or remembered) that local (or AF_UNIX)
> connections ignore the timeout value at making a connection. I think

BTW, you can reproduce the issue even when using a TCP connection
instead of a Unix domain socket by specifying a very large number
in the "keepalives" connection parameter for the foreign server.
Here is an example:

-----------------
create server loopback foreign data wrapper postgres_fdw options (host '127.0.0.1', port '5432', keepalives '99999999999');
-----------------

The reason behind this issue is that PQconnectPoll() parses
the "keepalives" parameter value by simply using strtol(),
while PQgetCancel() uses parse_int_param(). To fix this issue,
it might be better to update PQconnectPoll() so that it uses
parse_int_param() for parsing the "keepalives" parameter.

> the real issue here is that PGgetCancel is unnecessarily checking its
> value and failing as a result. Otherwise there would be no room for
> failure in the call to PQgetCancel() at that point in the example
> case.
>
> PQconnectPoll should remove the ignored parameters at connection or
> PQgetCancel should ingore the unreferenced (or unchecked)
> parameters. For example, the below diff takes the latter way and seems
> working (for at least AF_UNIX connections)

To clarify, are you suggesting that PQgetCancel() should
only parse the parameters for TCP connections
if cancel->raddr.addr.ss_family != AF_UNIX?
If so, I think that's a good idea.

> Of course, it's not great that pgfdw_cancel_query_begin() ignores the
> result from PQgetCancel(), but I think we don't need another ereport.

Can you please clarify why you suggest avoiding outputting
the warning message when PQgetCancel() returns NULL?
I think it is important to inform the user when an error
occurs and a cancel request cannot be sent, as this information
can help them identify the cause of the problem (such as
setting an overly large value for the keepalives parameter).

Regards,

--
Fujii Masao
Advanced Computing Technology Center
Research and Development Headquarters
NTT DATA CORPORATION

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message David Gilman 2023-04-12 14:40:28 Note new NULLS NOT DISTINCT on unique index tutorial page
Previous Message Stephen Frost 2023-04-12 14:33:22 Re: longfin missing gssapi_ext.h