postgres_fdw and column casting shippability

From: Jeff Janes <jeff(dot)janes(at)gmail(dot)com>
To: "pgsql-performance(at)postgresql(dot)org" <pgsql-performance(at)postgresql(dot)org>
Subject: postgres_fdw and column casting shippability
Date: 2017-05-15 21:42:14
Message-ID: CAMkU=1wuSiR++ujs-O+eAATcOm7MkQGJ1SLBHbDVy7C8TBz8UA@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-performance

I need to do a join between two foreign tables using columns of different
types.

select data from remote2 join remote1 on ((remote2.id)::bigint=remote1.id)
where cutoff > 0.9999;

For demonstration purposes, I use a loop-back foreign server, set up in the
attached sql file.

If I do the join directly on the "foreign" server by specifying the
schemaname where the physical tables live, it uses a sensible join plan,
using an index on cutoff column to get a handful of rows, then casting the
id column and using in index on remote1.id to get each row there.

explain analyze select data from remote.remote2 join remote.remote1 on ((
remote2.id)::bigint=remote1.id) where cutoff > 0.9999;

QUERY PLAN
------------------------------------------------------------------------------------------------------------------------------------
Nested Loop (cost=5.56..1100.48 rows=100 width=8) (actual
time=0.303..5.598 rows=119 loops=1)
-> Bitmap Heap Scan on remote2 (cost=5.13..334.85 rows=91 width=7)
(actual time=0.112..0.899 rows=105 loops=1)
Recheck Cond: (cutoff > '0.9999'::double precision)
Heap Blocks: exact=105
-> Bitmap Index Scan on remote2_cutoff_idx (cost=0.00..5.11
rows=91 width=0) (actual time=0.062..0.062 rows=105 loops=1)
Index Cond: (cutoff > '0.9999'::double precision)
-> Index Scan using remote1_id_idx on remote1 (cost=0.43..8.40 rows=1
width=16) (actual time=0.038..0.041 rows=1 loops=105)
Index Cond: (id = (remote2.id)::bigint)

But if I go through the foreign machinery, it doesn't use a good plan:

explain analyze select data from remote2 join remote1 on ((remote2.id
)::bigint=remote1.id) where cutoff > 0.9999;

QUERY PLAN
---------------------------------------------------------------------------------------------------------------------------------
Hash Join (cost=537.81..76743.81 rows=455000 width=4) (actual
time=75.019..4659.802 rows=119 loops=1)
Hash Cond: (remote1.id = (remote2.id)::bigint)
-> Foreign Scan on remote1 (cost=100.00..35506.00 rows=1000000
width=16) (actual time=1.110..4143.655 rows=1000000 loops=1)
-> Hash (cost=436.67..436.67 rows=91 width=7) (actual
time=2.754..2.754 rows=105 loops=1)
Buckets: 1024 Batches: 1 Memory Usage: 13kB
-> Foreign Scan on remote2 (cost=105.13..436.67 rows=91 width=7)
(actual time=1.567..2.646 rows=105 loops=1)
Planning time: 29.629 ms
Execution time: 4660.433 ms

I thought it would either push the entire join to the foreign side, or at
least do a foreign index scan on remote2_cutoff_idx, then loop over each
row and do a foreign index scans against remote1_id_idx.

I've tried versions 9.6.3 and 10dev, and neither do what I expected. It
doesn't seem to be a planning problem where it thinks the fast plan is
slower, it just doesn't seem to consider the faster plans as being options
at all. Is there some setting to make it realize the cast is shippable?
Is any of the work being done on postgres_fdw for V11 working towards
fixing this?

Cheers,

Jeff

Attachment Content-Type Size
cast_ship.sql text/plain 1.2 KB

Responses

Browse pgsql-performance by date

  From Date Subject
Next Message Tom Lane 2017-05-15 22:22:48 Re: postgres_fdw and column casting shippability
Previous Message plukovic 2017-05-11 17:11:27 Re: Speed differences between in executing the same query