Re: Bypassing cursors in postgres_fdw to enable parallel plans

From: KENAN YILMAZ <kenan(dot)yilmaz(at)localus(dot)com(dot)tr>
To: Rafia Sabih <rafia(dot)pghackers(at)gmail(dot)com>
Cc: Robert Haas <robertmhaas(at)gmail(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject: Re: Bypassing cursors in postgres_fdw to enable parallel plans
Date: 2025-03-04 16:14:29
Message-ID: CAEV8nuk7Abj5Kh6X1SgeJo39fPNrt1O1LTXkw0nFAUpd5DoKFQ@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Hi Rafia,
Sorry for the late response. I have completed my tests, and parallel
workers were successfully launched on the remote server. Below are the
details of my setup and test results.

# Machine Details
CPU: 4 cores
Memory: 8GB
PostgreSQL Version: 17.2 (compiled from source)
OS: Rocky Linux 8.10
Two VM instances

# PostgreSQL Configuration (For Demonstration)

logging_collector = on
log_truncate_on_rotation = on
log_rotation_size = 1GB
log_filename = 'postgresql-%a.log'
log_line_prefix = '%t [%p]: %q bg=%b, db=%d, usr=%u, client=%h, qryId=%Q,
txId=%x, app=%a, line=%l'
max_worker_processes = 4
max_parallel_workers_per_gather = 2
max_parallel_maintenance_workers = 2
max_parallel_workers = 4
debug_print_plan = on
track_functions = 'all'
log_statement = 'all'
postgres_fdw.use_cursor = false
shared_preload_libraries = 'pg_stat_statements,auto_explain'
compute_query_id = on
min_parallel_table_scan_size = 0
parallel_tuple_cost = 0
parallel_setup_cost = 0
auto_explain.log_min_duration = '0.00ms'
auto_explain.log_analyze = 'on'

# Memory Settings
effective_cache_size = '4GB'
shared_buffers = '1638MB'
work_mem = '100MB'

# Test Setup
# Creating pgbench Tables

$ pgbench -i -s 50 testdb

Running SQL Tests on Local Machine (192.168.1.68)

psql> CREATE EXTENSION postgres_fdw;
psql> CREATE SERVER fdwtest FOREIGN DATA WRAPPER postgres_fdw OPTIONS (host
'192.168.1.69', dbname 'testdb');
psql> CREATE USER MAPPING FOR postgres SERVER fdwtest OPTIONS (user
'postgres');
psql> CREATE SCHEMA fdwtest;
psql> IMPORT FOREIGN SCHEMA public FROM SERVER fdwtest INTO fdwtest;

# Query 1: Counting Rows in a Foreign Table
testdb=# explain analyze select count(*) from fdwtest.pgbench_accounts
where aid> 1000;
QUERY PLAN
──────────────────────────────────────────────────────────────────────────────────────────────────
Foreign Scan (cost=102.84..155.73 rows=1 width=8) (actual
time=457.965..457.967 rows=1 loops=1)
Relations: Aggregate on (pgbench_accounts)
Planning Time: 0.661 ms
Execution Time: 458.802 ms
(4 rows)

Time: 461.944 ms

# Query 2: Fetching a Single Row from Foreign Table
testdb=# explain analyze select aid from fdwtest.pgbench_accounts where
aid> 1000 limit 1;
QUERY PLAN
──────────────────────────────────────────────────────────────────────────────────────────────────────────────────
Foreign Scan on pgbench_accounts (cost=100.00..100.24 rows=1 width=4)
(actual time=9.335..9.336 rows=1 loops=1)
Planning Time: 0.452 ms
Execution Time: 10.304 ms
(3 rows)

Time: 12.605 ms

## PostgreSQL Logs
## Local Machine Logs (192.168.1.68) – Cropped

bg=client backend, db=testdb, usr=postgres, client=[local] ,
qryId=-6000156370405137929 , txId=0, app=psql, line=14 LOG: duration:
457.971 ms plan:
Query Text: explain analyze select count(*) from
fdwtest.pgbench_accounts where aid> 1000;
Foreign Scan (cost=102.84..155.73 rows=1 width=8) (actual
time=457.965..457.967 rows=1 loops=1)
Relations: Aggregate on (pgbench_accounts)
..
STATEMENT: explain analyze select aid from fdwtest.pgbench_accounts where
aid> 1000 limit 1;
bg=client backend, db=testdb, usr=postgres, client=[local] ,
qryId=5870070636604972000 , txId=0, app=psql, line=19 LOG: duration: 9.339
ms plan:
Query Text: explain analyze select aid from
fdwtest.pgbench_accounts where aid> 1000 limit 1;
Foreign Scan on pgbench_accounts (cost=100.00..100.24 rows=1
width=4) (actual time=9.335..9.336 rows=1 loops=1)

## Remote Machine Logs (192.168.1.69) – Cropped
STATEMENT: SELECT count(*) FROM public.pgbench_accounts WHERE ((aid >
1000))
bg=client backend, db=testdb, usr=postgres, client=192.168.1.68 ,
qryId=-7176633966431489392 , txId=0, app=postgres_fdw, line=22 LOG:
execute <unnamed>: SELECT count(*) FROM public.pgbench_accounts WHERE
((aid > 1000))
bg=client backend, db=testdb, usr=postgres, client=192.168.1.68 , qryId=0 ,
txId=0, app=postgres_fdw, line=23 LOG: statement: COMMIT TRANSACTION
bg=client backend, db=testdb, usr=postgres, client=192.168.1.68 ,
qryId=-2835399305386018931 , txId=0, app=postgres_fdw, line=24 LOG:
duration: 455.461 ms plan:
Query Text: SELECT count(*) FROM public.pgbench_accounts WHERE
((aid > 1000))
Finalize Aggregate (cost=113216.98..113216.99 rows=1 width=8)
(actual time=454.321..455.452 rows=1 loops=1)
-> Gather (cost=113216.97..113216.98 rows=2 width=8) (actual
time=454.173..455.443 rows=3 loops=1)
Workers Planned: 2
Workers Launched: 2
-> Partial Aggregate (cost=113216.97..113216.98 rows=1
width=8) (actual time=449.393..449.394 rows=1 loops=3)
-> Parallel Seq Scan on pgbench_accounts
(cost=0.00..108009.67 rows=2082920 width=0) (actual time=0.255..343.378
rows=1666333 loops=3)
Filter: (aid > 1000)
Rows Removed by Filter: 333
bg=client backend, db=testdb, usr=postgres, client=192.168.1.68 , qryId=0 ,
txId=0, app=postgres_fdw, line=25 LOG: statement: START TRANSACTION
ISOLATION LEVEL REPEATABLE READ
..
STATEMENT: SELECT aid FROM public.pgbench_accounts WHERE ((aid > 1000))
LIMIT 1::bigint
bg=client backend, db=testdb, usr=postgres, client=192.168.1.68 ,
qryId=5994602644362067232 , txId=0, app=postgres_fdw, line=29 LOG: execute
<unnamed>: SELECT aid FROM public.pgbench_accounts WHERE ((aid > 1000))
LIMIT 1::bigint
bg=client backend, db=testdb, usr=postgres, client=192.168.1.68 , qryId=0 ,
txId=0, app=postgres_fdw, line=30 LOG: statement: COMMIT TRANSACTION
bg=client backend, db=testdb, usr=postgres, client=192.168.1.68 ,
qryId=-2835399305386018931 , txId=0, app=postgres_fdw, line=31 LOG:
duration: 7.983 ms plan:
Query Text: SELECT aid FROM public.pgbench_accounts WHERE ((aid >
1000)) LIMIT 1::bigint
Limit (cost=0.00..0.02 rows=1 width=4) (actual time=0.836..7.974
rows=1 loops=1)
-> Gather (cost=0.00..108009.67 rows=4999007 width=4) (actual
time=0.834..7.972 rows=1 loops=1)
Workers Planned: 2
Workers Launched: 2
-> Parallel Seq Scan on pgbench_accounts
(cost=0.00..108009.67 rows=2082920 width=4) (actual time=0.270..0.271
rows=1 loops=3)
Filter: (aid > 1000)
Rows Removed by Filter: 333

if you would like me to conduct more complex tests, feel free to let me
know.
Best regards,
Kenan YILMAZ

KENAN YILMAZ <kenan(dot)yilmaz(at)localus(dot)com(dot)tr>, 17 Şub 2025 Pzt, 17:09
tarihinde şunu yazdı:

> Hi Rafia,
>
> Based on our previous private discussion, thanks for the update and for
> clarifying the current state of the patch.
> I understand that more substantial changes are on the way, so I’ll focus
> on relevant test scenarios rather than performance testing at this stage.
>
> I will proceed with expanding the scenarios, including:
>
> Multiple postgres_fdw servers are active at the same time
> Multiple connections from the same postgres_fdw are active concurrently
> Multiple transactions run simultaneously on a single connection
> Multiple sessions operate from a single active connection
>
> I will submit the results of these tests to this mail thread so that they
> can benefit the broader community as well. Additionally, once you publish
> the updated version of the patch, I will rerun the tests with the latest
> changes and share the updated results.
>
> Best Regards,
>
> Rafia Sabih <rafia(dot)pghackers(at)gmail(dot)com>, 17 Şub 2025 Pzt, 16:46 tarihinde
> şunu yazdı:
>
>> On Tue, 14 Jan 2025 at 18:33, Robert Haas <robertmhaas(at)gmail(dot)com> wrote:
>>
>>> On Mon, Jan 6, 2025 at 3:52 AM Rafia Sabih <rafia(dot)pghackers(at)gmail(dot)com>
>>> wrote:
>>> > Now, to overcome this limitation, I have worked on this idea
>>> (suggested by my colleague Bernd Helmle) of bypassing the cursors. The way
>>> it works is as follows,
>>> > there is a new GUC introduced postgres_fdw.use_cursor, which when
>>> unset uses the mode without the cursor. Now, it uses PQsetChunkedRowsMode
>>> in create_cursor when non-cursor mode is used. The size of the chunk is the
>>> same as the fetch_size. Now in fetch_more_data, when non-cursor mode is
>>> used, pgfdw_get_next_result is used to get the chunk in PGresult and
>>> processed in the same manner as before.
>>> >
>>> > Now, the issue comes when there are simultaneous queries, which is the
>>> case with the join queries where all the tables involved in the join are at
>>> the local server. Because in that case we have multiple cursors opened at
>>> the same time and without a cursor mechanism we do not have any information
>>> or any other structure to know what to fetch from which query. To handle
>>> that case, we have a flag only_query, which is unset as soon as we have
>>> assigned the cursor_number >= 2, in postgresBeginForeignScan. Now, in
>>> fetch_more data, when we find out that only_query is unset, then we fetch
>>> all the data for the query and store it in a Tuplestore. These tuples are
>>> then transferred to the fsstate->tuples and then processed as usual.
>>> >
>>> > So yes there is a performance drawback in the case of simultaneous
>>> queries, however, the ability to use parallel plans is really an added
>>> advantage for the users. Plus, we can keep things as before by this new GUC
>>> -- use_cursor, in case we are losing more for some workloads. So, in short
>>> I feel hopeful that this could be a good idea and a good time to improve
>>> postgres_fdw.
>>>
>>> Hi,
>>>
>>> I think it might have been nice to credit me in this post, since I
>>> made some relevant suggestions here off-list, in particular the idea
>>> of using a Tuplestore when there are multiple queries running. But I
>>> don't think this patch quite implements what I suggested. Here, you
>>> have a flag only_query which gets set to true at some point in time
>>> and thereafter remains true for the lifetime of a session. That means,
>>> I think, that all future queries will use the tuplestore even though
>>> there might not be multiple queries running any more. which doesn't
>>> seem like what we want. And, actually, this looks like it will be set
>>> as soon as you reach the second query in the same transaction, even if
>>> the two queries don't overlap. I think what you want to do is test
>>> whether, at the point where we would need to issue a new query,
>>> whether an existing query is already running. If not, move that
>>> query's remaining results into a Tuplestore so you can issue the new
>>> query.
>>>
>>> I'm not sure what the best way to implement that is, exactly. Perhaps
>>> fsstate->conn_state needs to store some more details about the
>>> connection, but that's just a guess. I don't think a global variable
>>> is what you want. Not only is that session-lifetime, but it applies
>>> globally to every connection to every server. You want to test
>>> something that is specific to one connection to one server, so it
>>> needs to be part of a data structure that is scoped that way.
>>>
>>> I think you'll want to figure out a good way to test this patch. I
>>> don't know if we need or can reasonably have automated test cases for
>>> this new functionality, but you at least want to have a good way to do
>>> manual testing, so that you can show that the tuplestore is used in
>>> cases where it's necessary and not otherwise. I'm not yet sure whether
>>> this patch needs automated test cases or whether they can reasonably
>>> be written, but you at least want to have a good procedure for manual
>>> validation so that you can verify that the Tuplestore is used in all
>>> the cases where it needs to be and, hopefully, no others.
>>>
>>> --
>>> Robert Haas
>>> EDB: http://www.enterprisedb.com
>>>
>>
>> Indeed you are right.
>> Firstly, accept my apologies for not mentioning you in credits for this
>> work. Thanks a lot for your efforts, discussions with you were helpful in
>> shaping this patch and bringing it to this level.
>>
>> Next, yes the last version was using tuplestore for queries within the
>> same transaction after the second query. To overcome this, I came across
>> this method to identify if there is any other simultaneous query running
>> with the current query; now there is an int variable num_queries which is
>> incremented at every call of postgresBeginForeignScan and decremented at
>> every call of postgresEndForeignScan. This way, if there are simultaneous
>> queries running we get the value of num_queries greater than 1. Now, we
>> check the value of num_queries and use tuplestore only when num_queries is
>> greater than 1. So, basically the understanding here is that if
>> postgresBeginForeignScan is called and before the call of
>> postgresEndForeignScan if another call to postgresBeginForeignScan is made,
>> then these are simultaneous queries.
>>
>> I couldn't really find any automated method of testing this, but did it
>> manually by debugging and/or printing log statements in
>> postgresBeginForeingScan, postgresEndForeignScan, and fetch_more_data to
>> confirm indeed there are simultaneous queries, and only they are using
>> tuplestore. So, the case of simultaneous queries I found was the join
>> query. Because, there it creates the cursor for one side of the join and
>> retrieves the first tuples for it and then creates the next cursor for the
>> other side of join and keeps on reading all the tuples for that query and
>> then it comes back to first cursor and retrieves all the tuples for that
>> one. Similarly, it works for the queries with n number of tables in join,
>> basically what I found is if there are n tables in the join there will be n
>> open cursors at a time and then they will be closed one by one in the
>> descending order of the cursor_number. I will think more on the topic of
>> testing this and will try to come up with a script (in the best case) to
>> confirm the use of tuplestore in required cases only, or atleast with a set
>> of steps to do so.
>>
>> For the regular testing of this feature, I think a regression test with
>> this new GUC postgres_fdw.use_cursor set to false and running all the
>> existing tests of postgres_fdw should suffice. What do you think? However,
>> at the moment when non-cursor mode is used, regression tests are failing.
>> Some queries require order by because order is changed in non-cursor mode,
>> but some require more complex changes, I am working on them.
>>
>> In this version of the patch I have added only the changes mentioned
>> above and not the regression test modification.
>>
>> --
>> Regards,
>> Rafia Sabih
>> CYBERTEC PostgreSQL International GmbH
>>
>

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Peter Eisentraut 2025-03-04 16:15:24 Re: Next commitfest app release is planned for March 18th
Previous Message Jelte Fennema-Nio 2025-03-04 16:13:31 Re: Next commitfest app release is planned for March 18th