possible issue in postgres_fdw batching

From: Tomas Vondra <tomas(at)vondra(dot)me>
To: PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: possible issue in postgres_fdw batching
Date: 2024-08-18 23:26:58
Message-ID: de632277-2153-44e4-b41d-ba7ef334ab03@vondra.me
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Hi,

There's a pgsql-bugs report [1] about a behavior change with batching
enabled in postgres_fdw. While I ultimately came to the conclusion the
behavior change is not a bug, I find it annoying. But I'm not 100% sure
about the "not a bug" conclusion, and and the same time I can't think of
a way to handle it better ...

The short story is that given a foreign table "t" and a function "f"
that queries the data, the batching can change the behavior. The bug
report uses DEFAULT expressions and constraints, but that's not quite
necessary.

Consider this simplified example:

CREATE TABLE t (a INT);

CREATE FOREIGN TABLE f (a INT) SERVER loopback
OPTIONS (table_name 't');

CREATE FUNCTION counter() RETURNS int LANGUAGE sql AS
$$ SELECT COUNT(*) FROM f $$;

And now do

INSERT INTO f SELECT counter() FROM generate_series(1,100);

With batch_size=1 this produces a nice sequence, with batching we start
to get duplicate values - which is not surprising, as the function is
oblivious to the data still in the buffer.

The first question is if this is a bug. At first I thought yes - this is
a bug, but I'm no longer convinced of that. The problem is this function
is inherently unsafe under concurrency - even without batching, it only
takes two concurrent sessions to get the same misbehavior.

Ofc, the concurrency can be prevented by locking the table in some way,
but the batching can be disabled with the same effect.

Anyway, I was thinking about ways to improve / fix this, and I don't
have much. The first thought was that we could inspect the expressions
and disable batching if any of the expressions is volatile.

Unfortunately, the expressions can come from anywhere (it's not just
default values as in the original report), and it's not even easily
accessible from nodeModifyTable. Which is where we decide whether to do
batching, etc. It might be a subquery, values, ...

The other option that just occurred to me is that maybe it would be
possible to flush the batches if we happen to query the same foreign
table from other parts of the plan. In this case it'd mean that every
time we call counter(), we'd flush data. But that doesn't seem very
simple either, because we need to find the nodeModifyTable nodes for the
same foreign table, and we'd need to trigger the flush.

Opinions? Does this qualify as a bug, of just a manifestation of a
function that doesn't work under concurrency?

[1]
https://www.postgresql.org/message-id/20240809030755.jubqv6f6vpxkfkzv@jasonk.me

--
Tomas Vondra

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Tom Lane 2024-08-18 23:40:59 Re: possible issue in postgres_fdw batching
Previous Message Tom Lane 2024-08-18 22:55:34 Re: Cirrus CI for macOS branches 16 and 15 broken