Re: Same query 10000x More Time

From: Vijaykumar Jain <vijaykumarjain(dot)github(at)gmail(dot)com>
To: Avi Weinberg <AviW(at)gilat(dot)com>
Cc: Kyotaro Horiguchi <horikyota(dot)ntt(at)gmail(dot)com>, "pgsql-performa(dot)" <pgsql-performance(at)postgresql(dot)org>
Subject: Re: Same query 10000x More Time
Date: 2022-01-06 16:50:13
Message-ID: CAM+6J96i4Wad41fLvcJB_anBGQgAxyg-RcBirtAH0=QcVYTEmw@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-performance

On Thu, 6 Jan 2022 at 20:01, Avi Weinberg <AviW(at)gilat(dot)com> wrote:

> Thanks for the input
>
>
>
> postgres_fdw seems to bring the entire table even if all I use in the join
> is just the id from the remote table. I know it is possible to query for
> the missing ids and then perform the delete, but I wonder why all types of
> joins are so inefficient.
>
>
>

just for fun, please do not do this.
I tried out multiple options where we join a small local table to a huge
remote table with multiple plan skip settings.

postgres(at)db:~/playground$ psql
psql (14beta1)
Type "help" for help.

postgres=# \c localdb
You are now connected to database "localdb" as user "postgres".
localdb=# \x
Expanded display is on.
localdb=# table pg_foreign_server;
-[ RECORD 1
]-----------------------------------------------------------------------------------------------
oid | 85462
srvname | remote_server
srvowner | 10
srvfdw | 85458
srvtype |
srvversion |
srvacl |
srvoptions |
{dbname=remotedb,use_remote_estimate=true,fdw_startup_cost=0,fdw_tuple_cost=0,fetch_size=10000}

localdb=# \x
Expanded display is off.
localdb=# \dt
List of relations
Schema | Name | Type | Owner
--------+------+-------+----------
public | t | table | postgres
(1 row)

localdb=# \det remote_schema.remote_table;
List of foreign tables
Schema | Table | Server
---------------+--------------+---------------
remote_schema | remote_table | remote_server
(1 row)

localdb=# \c remotedb;
You are now connected to database "remotedb" as user "postgres".
remotedb=# \dt
List of relations
Schema | Name | Type | Owner
--------+--------------+-------+----------
public | remote_table | table | postgres
(1 row)

remotedb=# select count(1) from remote_table;
count
--------
100000
(1 row)

remotedb=# \c localdb
You are now connected to database "localdb" as user "postgres".
localdb=# select count(1) from t;
count
-------
10
(1 row)

*# all the set options are forcing the optmizer to skip that plan route*
localdb=# explain (analyze, verbose) select * from t join
remote_schema.remote_table r on (t.t_id = r.t_id);
QUERY PLAN
----------------------------------------------------------------------------------------------------------------------------------------------
Hash Join (cost=1.23..2817.97 rows=100000 width=16) (actual
time=5.814..63.310 rows=90000 loops=1)
Output: t.t_id, t.t_col, r.rt_id, r.t_id
Inner Unique: true
Hash Cond: (r.t_id = t.t_id)
-> Foreign Scan on remote_schema.remote_table r (cost=0.00..2443.00
rows=100000 width=8) (actual time=5.797..47.329 rows=100000 loops=1)
Output: r.rt_id, r.t_id
*Remote SQL: SELECT rt_id, t_id FROM public.remote_table*
-> Hash (cost=1.10..1.10 rows=10 width=8) (actual time=0.009..0.010
rows=10 loops=1)
Output: t.t_id, t.t_col
Buckets: 1024 Batches: 1 Memory Usage: 9kB
-> Seq Scan on public.t (cost=0.00..1.10 rows=10 width=8)
(actual time=0.005..0.006 rows=10 loops=1)
Output: t.t_id, t.t_col
Planning Time: 4.464 ms
Execution Time: 65.995 ms
(14 rows)

localdb=# set enable_seqscan TO 0;
SET
localdb=# explain (analyze, verbose) select * from t join
remote_schema.remote_table r on (t.t_id = r.t_id);
QUERY PLAN
----------------------------------------------------------------------------------------------------------------------------------------------
Hash Join (cost=12.41..2829.16 rows=100000 width=16) (actual
time=5.380..61.028 rows=90000 loops=1)
Output: t.t_id, t.t_col, r.rt_id, r.t_id
Inner Unique: true
Hash Cond: (r.t_id = t.t_id)
-> Foreign Scan on remote_schema.remote_table r (cost=0.00..2443.00
rows=100000 width=8) (actual time=5.362..45.625 rows=100000 loops=1)
Output: r.rt_id, r.t_id
*Remote SQL: SELECT rt_id, t_id FROM public.remote_table*
-> Hash (cost=12.29..12.29 rows=10 width=8) (actual time=0.011..0.011
rows=10 loops=1)
Output: t.t_id, t.t_col
Buckets: 1024 Batches: 1 Memory Usage: 9kB
-> Index Scan using t_pkey on public.t (cost=0.14..12.29 rows=10
width=8) (actual time=0.005..0.008 rows=10 loops=1)
Output: t.t_id, t.t_col
Planning Time: 0.696 ms
Execution Time: 63.666 ms
(14 rows)

localdb=# set enable_hashjoin TO 0;
SET
localdb=# explain (analyze, verbose) select * from t join
remote_schema.remote_table r on (t.t_id = r.t_id);
QUERY PLAN
----------------------------------------------------------------------------------------------------------------------------------------------
Nested Loop (cost=0.15..4821.93 rows=100000 width=16) (actual
time=5.199..75.817 rows=90000 loops=1)
Output: t.t_id, t.t_col, r.rt_id, r.t_id
Inner Unique: true
-> Foreign Scan on remote_schema.remote_table r (cost=0.00..2443.00
rows=100000 width=8) (actual time=5.186..46.152 rows=100000 loops=1)
Output: r.rt_id, r.t_id
*Remote SQL: SELECT rt_id, t_id FROM public.remote_table*
-> Result Cache (cost=0.15..0.16 rows=1 width=8) (actual
time=0.000..0.000 rows=1 loops=100000)
Output: t.t_id, t.t_col
Cache Key: r.t_id
Hits: 99990 Misses: 10 Evictions: 0 Overflows: 0 Memory Usage:
2kB
-> Index Scan using t_pkey on public.t (cost=0.14..0.15 rows=1
width=8) (actual time=0.001..0.001 rows=1 loops=10)
Output: t.t_id, t.t_col
Index Cond: (t.t_id = r.t_id)
Planning Time: 0.692 ms
Execution Time: 78.512 ms
(15 rows)

localdb=# set enable_resultcache TO 0;
SET
localdb=# explain (analyze, verbose) select * from t join
remote_schema.remote_table r on (t.t_id = r.t_id);
QUERY PLAN
----------------------------------------------------------------------------------------------------------------------------------------------
Merge Join (cost=479.10..5847.98 rows=100000 width=16) (actual
time=12.855..66.094 rows=90000 loops=1)
Output: t.t_id, t.t_col, r.rt_id, r.t_id
Inner Unique: true
Merge Cond: (r.t_id = t.t_id)
-> Foreign Scan on remote_schema.remote_table r (cost=0.29..4586.89
rows=100000 width=8) (actual time=6.235..55.329 rows=100000 loops=1)
Output: r.rt_id, r.t_id
*Remote SQL: SELECT rt_id, t_id FROM public.remote_table ORDER BY
t_id ASC NULLS LAST*
-> Index Scan using t_pkey on public.t (cost=0.14..12.29 rows=10
width=8) (actual time=0.006..0.024 rows=9 loops=1)
Output: t.t_id, t.t_col
Planning Time: 0.704 ms
Execution Time: 68.724 ms
(11 rows)

localdb=# set enable_mergejoin TO 0;
SET
localdb=# explain (analyze, verbose) select * from t join
remote_schema.remote_table r on (t.t_id = r.t_id);
QUERY PLAN
--------------------------------------------------------------------------------------------------------------------------------------------
Nested Loop (cost=113.94..8830.28 rows=100000 width=16) (actual
time=11.576..89.465 rows=90000 loops=1)
Output: t.t_id, t.t_col, r.rt_id, r.t_id
-> Index Scan using t_pkey on public.t (cost=0.14..12.29 rows=10
width=8) (actual time=0.009..0.034 rows=10 loops=1)
Output: t.t_id, t.t_col
-> Foreign Scan on remote_schema.remote_table r (cost=113.80..781.80
rows=10000 width=8) (actual time=7.648..8.108 rows=9000 loops=10)
Output: r.rt_id, r.t_id
*Remote SQL: SELECT rt_id, t_id FROM public.remote_table WHERE
(($1::integer = t_id))*
Planning Time: 0.667 ms
Execution Time: 92.131 ms
(9 rows)

from the logs for the last case: (it has open a new cursor everytime for
each matching id) and is still the *slowest.*

2022-01-06 22:10:48.665 IST [2318] LOG: execute <unnamed>: DECLARE c1
CURSOR FOR
SELECT rt_id, t_id FROM public.remote_table WHERE (($1::integer = t_id))
2022-01-06 22:10:48.665 IST [2318] DETAIL: parameters: $1 = '1'
2022-01-06 22:10:48.665 IST [2318] LOG: statement: FETCH 10000 FROM c1
2022-01-06 22:10:48.679 IST [2318] LOG: statement: FETCH 10000 FROM c1
2022-01-06 22:10:48.679 IST [2318] LOG: statement: CLOSE c1
2022-01-06 22:10:48.679 IST [2318] LOG: execute <unnamed>: DECLARE c1
CURSOR FOR
SELECT rt_id, t_id FROM public.remote_table WHERE (($1::integer = t_id))
2022-01-06 22:10:48.679 IST [2318] DETAIL: parameters: $1 = '2'
2022-01-06 22:10:48.679 IST [2318] LOG: statement: FETCH 10000 FROM c1
2022-01-06 22:10:48.686 IST [2318] LOG: statement: FETCH 10000 FROM c1
2022-01-06 22:10:48.687 IST [2318] LOG: statement: CLOSE c1
2022-01-06 22:10:48.687 IST [2318] LOG: execute <unnamed>: DECLARE c1
CURSOR FOR
SELECT rt_id, t_id FROM public.remote_table WHERE (($1::integer = t_id))
2022-01-06 22:10:48.687 IST [2318] DETAIL: parameters: $1 = '3'
2022-01-06 22:10:48.687 IST [2318] LOG: statement: FETCH 10000 FROM c1
2022-01-06 22:10:48.698 IST [2318] LOG: statement: FETCH 10000 FROM c1
2022-01-06 22:10:48.698 IST [2318] LOG: statement: CLOSE c1
2022-01-06 22:10:48.698 IST [2318] LOG: execute <unnamed>: DECLARE c1
CURSOR FOR
SELECT rt_id, t_id FROM public.remote_table WHERE (($1::integer = t_id))
2022-01-06 22:10:48.698 IST [2318] DETAIL: parameters: $1 = '4'
2022-01-06 22:10:48.698 IST [2318] LOG: statement: FETCH 10000 FROM c1
2022-01-06 22:10:48.705 IST [2318] LOG: statement: FETCH 10000 FROM c1
2022-01-06 22:10:48.705 IST [2318] LOG: statement: CLOSE c1

so i think, i just trust the optimizer, or rewrite my so as to gather the
predicate first locally and then pass them to remote, or use
materialized views to maintain
a stale copy of the remote table on my local db etc.

In response to

Browse pgsql-performance by date

  From Date Subject
Next Message Hüseyin Ellezer 2022-01-11 08:41:05 About Query Performaces Problem
Previous Message Avi Weinberg 2022-01-06 14:31:46 RE: Same query 10000x More Time