Re: [PATCH] postgres_fdw connection caching - cause remote sessions linger till the local session exit

From: Bharath Rupireddy <bharath(dot)rupireddyforpostgres(at)gmail(dot)com>
To: Fujii Masao <masao(dot)fujii(at)oss(dot)nttdata(dot)com>
Cc: Alexey Kondratov <a(dot)kondratov(at)postgrespro(dot)ru>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject: Re: [PATCH] postgres_fdw connection caching - cause remote sessions linger till the local session exit
Date: 2020-12-11 18:01:53
Message-ID: CALj2ACVFfsmAMyYVidXWfu4URST2xwO20NSYrHUQwxG070poxA@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Fri, Dec 11, 2020 at 11:01 PM Fujii Masao
<masao(dot)fujii(at)oss(dot)nttdata(dot)com> wrote:
> On 2020/12/11 19:16, Bharath Rupireddy wrote:
> > On Thu, Dec 10, 2020 at 7:14 AM Bharath Rupireddy
> > <bharath(dot)rupireddyforpostgres(at)gmail(dot)com> wrote:
> >>> + /* We only look for active and open remote connections. */
> >>> + if (entry->invalidated || !entry->conn)
> >>> + continue;
> >>>
> >>> We should return even invalidated entry because it has still cached connection?
> >>>
> >>
> >> I checked this point earlier, for invalidated connections, the tuple
> >> returned from the cache is also invalid and the following error will
> >> be thrown. So, we can not get the server name for that user mapping.
> >> Cache entries too would have been invalidated after the connection is
> >> marked as invalid in pgfdw_inval_callback().
> >>
> >> umaptup = SearchSysCache1(USERMAPPINGOID, ObjectIdGetDatum(entry->key));
> >> if (!HeapTupleIsValid(umaptup))
> >> elog(ERROR, "cache lookup failed for user mapping with OID %u",
> >> entry->key);
> >>
> >
> > I further checked on returning invalidated connections in the output
> > of the function. Actually, the reason I'm seeing a null tuple from sys
> > cache (and hence the error "cache lookup failed for user mapping with
> > OID xxxx") for an invalidated connection is that the user mapping
> > (with OID entry->key that exists in the cache) is getting dropped, so
> > the sys cache returns null tuple. The use case is as follows:
> >
> > 1) Create a server, role, and user mapping of the role with the server
> > 2) Run a foreign table query, so that the connection related to the
> > server gets cached
> > 3) Issue DROP OWNED BY for the role created, since the user mapping is
> > dependent on that role, it gets dropped from the catalogue table and
> > an invalidation message will be pending to clear the sys cache
> > associated with that user mapping.
> > 4) Now, if I do select * from postgres_fdw_get_connections() or for
> > that matter any query, at the beginning the txn
> > AtStart_Cache()-->AcceptInvalidationMessages()-->pgfdw_inval_callback()
> > gets called and marks the cached entry as invalidated. Remember the
> > reason for this invalidation message is that the user mapping with the
> > OID entry->key is dropped from 3). Later in
> > postgres_fdw_get_connections(), when we search the sys cache with
> > entry->key for that invalidated connection, since the user mapping is
> > dropped from the system, null tuple is returned.
>
> Thanks for the analysis! This means that the cached connection invalidated by drop of server or user mapping will not be closed even by the subsequent access to the foreign server and will remain until the backend exits. Right?

It will be first marked as invalidated via
AtStart_Cache()-->AcceptInvalidationMessages()-->pgfdw_inval_callback(),
and on the next use of that connection invalidated connections are
disconnected and reconnected.

if (entry->conn != NULL && entry->invalidated && entry->xact_depth == 0)
{
elog(DEBUG3, "closing connection %p for option changes to take effect",
entry->conn);
disconnect_pg_server(entry);
}

> If so, this seems like a connection-leak bug, at least for me.... Thought?
>

It's not a leak. The comment before pgfdw_inval_callback() [1]
explains why we can not immediately close/disconnect the connections
in pgfdw_inval_callback() after marking them as invalidated.

Here is the scenario how in the midst of a txn we get invalidation
messages(AtStart_Cache()-->AcceptInvalidationMessages()-->pgfdw_inval_callback()
happens):

1) select from foreign table with server1, usermapping1 in session1
2) begin a top txn in session1, run a few foreign queries that open up
sub txns internally. meanwhile alter/drop server1/usermapping1 in
session2, then at each start of sub txn also we get to process the
invalidation messages via
AtStart_Cache()-->AcceptInvalidationMessages()-->pgfdw_inval_callback().
So, if we disconnect right after marking invalidated in
pgfdw_inval_callback, that's a problem since we are in a sub txn under
a top txn.

I don't think we can do something here and disconnect the connections
right after the invalidation happens. Thoughts?

[1]
/*
* Connection invalidation callback function
*
* After a change to a pg_foreign_server or pg_user_mapping catalog entry,
* mark connections depending on that entry as needing to be remade.
* We can't immediately destroy them, since they might be in the midst of
* a transaction, but we'll remake them at the next opportunity.
*
* Although most cache invalidation callbacks blow away all the related stuff
* regardless of the given hashvalue, connections are expensive enough that
* it's worth trying to avoid that.
*
* NB: We could avoid unnecessary disconnection more strictly by examining
* individual option values, but it seems too much effort for the gain.
*/
static void
pgfdw_inval_callback(Datum arg, int cacheid, uint32 hashvalue)

With Regards,
Bharath Rupireddy.
EnterpriseDB: http://www.enterprisedb.com

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Tom Lane 2020-12-11 18:04:38 Re: pg_basebackup test coverage
Previous Message Bruce Momjian 2020-12-11 18:01:14 Re: Proposed patch for key managment