RE: [Proposal] Add foreign-server health checks infrastructure

From: "Hayato Kuroda (Fujitsu)" <kuroda(dot)hayato(at)fujitsu(dot)com>
To: 'Fujii Masao' <masao(dot)fujii(at)oss(dot)nttdata(dot)com>
Cc: Katsuragi Yuta <katsuragiy(at)oss(dot)nttdata(dot)com>, vignesh C <vignesh21(at)gmail(dot)com>, Ted Yu <yuzhihong(at)gmail(dot)com>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, "pgsql-hackers(at)lists(dot)postgresql(dot)org" <pgsql-hackers(at)lists(dot)postgresql(dot)org>, Önder Kalacı <onderkalaci(at)gmail(dot)com>, "Shinya11(dot)Kato(at)oss(dot)nttdata(dot)com" <Shinya11(dot)Kato(at)oss(dot)nttdata(dot)com>, Kyotaro Horiguchi <horikyota(dot)ntt(at)gmail(dot)com>, 'Shubham Khanna' <khannashubham1197(at)gmail(dot)com>
Subject: RE: [Proposal] Add foreign-server health checks infrastructure
Date: 2024-07-18 10:49:04
Message-ID: TYAPR01MB56923ACAE33FD04A3560F413F5AC2@TYAPR01MB5692.jpnprd01.prod.outlook.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Dear Fujii-san,

Hi, long time no see :-).
Thanks for reviewing the patch! PSA new version.

> I've just started reviewing them.
>
> Here are the review comments for 0001 patch:
>
> Do we really need postgres_fdw_verify_connection_all()? The proposed feature
> aims to check if all postgres_fdw connections used within the transaction
> are still open. If any of those connections are closed, the transaction
> can't be committed successfully, so users can roll back immediately upon
> detecting a closed connection.
>
> However, postgres_fdw_verify_connection_all() checks all connections made in
> the session, not just those used in the current transaction. This means
> users can't determine if they should roll back the transaction based on
> its return value. Therefore, I'm concerned that
> postgres_fdw_verify_connection_all() might not be very useful. Thoughts?

Right. My primal motivation is to detect the disconnection from remotes and abort
current transaction as soon as possible. In this case, as I posted in [1],
...all() is not so helpful.
I agreed to remove the function from patch set. Done.

> Considering the purpose of this feature, wouldn't it be better to extend
> postgres_fdw_get_connections() to include a "used_in_xact" column
> (indicating whether the connection has been used in the current transaction)
> and a "closed" column (indicating whether the connection has been closed)?
> This approach might be more effective than introducing a new function
> like the postgres_fdw_verify_connection family?
>
> If it's too much to check if the connection is closed by default whenever
> calling postgres_fdw_get_connections(), we could modify it to accept
> an argument indicating whether to perform this check. Thoughts?

Sounds interesting. If we can accept to change the definition of pre-existing
function, it seems better. To keep the default behavior, an input parameter
should be added. Attached patch tried to implement.

> Here are the review comments for 0003 patch:
>
> The source comment in postgres_fdw_get_connections() should mention
> the return value user_name.

Document was updated.

> We also need to handle the case where the user mapping used by
> the connection cache has been dropped. Otherwise, this could
> lead to an error.
>
> -----------------------------
> =# BEGIN;
> =*# SELECT * FROM postgres_fdw_get_connections();
> server_name | user_name | valid
> -------------+-----------+-------
> loopback | public | t
> (1 row)
>
> =*# DROP USER MAPPING FOR public SERVER loopback ;
> =*# SELECT * FROM postgres_fdw_get_connections();
> ERROR: cache lookup failed for user mapping 16409
> -----------------------------

Fixed.

> -SELECT * FROM postgres_fdw_get_connections() ORDER BY 1;
> +SELECT server_name, valid FROM postgres_fdw_get_connections() ORDER BY
> 1;
>
> Shouldn't this test also check if the returned user_name is valid?

You meant to say that we should print the user_name, right? Done.

> + server_name | user_name | validvalid
> +-------------+------------------------
> + loopback1 | postgres | t
> + loopback2 | | f
>
> The column name "validvalid" should be "valid".

Right, fixed.

> How can we cause the record with server_name != NULL but user_name = NULL?

Now this can happen when user_name is dropped, but the example was updated.

Below contains a summary of changes.
0001:
- Instead of adding new functions, postgres_fdw_get_connections() was extended.
Some tricks were added to support old versions. I followed pg_stat_statements.c.
Attributes were added after the `valid` to preserve ordering of the old version.
- I found an inconsistency of name between source and doc,
so I unified to postgres_fdw_can_verify_connection().
- Also, test patch (0002) was combined into this.
0002:
- user_name was added after the `valid` to preserve ordering of the old version.
- GetUserMappingFromOid() is allowed to miss a tuple.

[1]: https://www.postgresql.org/message-id/TYAPR01MB58668728393648C2F7DC7C85F5399%40TYAPR01MB5866.jpnprd01.prod.outlook.com

Best regards,
Hayato Kuroda
FUJITSU LIMITED

Attachment Content-Type Size
v41-0001-postgres_fdw-Allow-postgres_fdw_get_connections-.patch application/octet-stream 19.4 KB
v41-0002-Extend-postgres_fdw_get_connections-to-return-us.patch application/octet-stream 9.4 KB

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message David Rowley 2024-07-18 10:54:43 Re: Add mention of execution time memory for enable_partitionwise_* GUCs
Previous Message Masahiro.Ikeda 2024-07-18 10:37:42 RE: Showing applied extended statistics in explain Part 2