From: | Robert Haas <robertmhaas(at)gmail(dot)com> |
---|---|
To: | Ashutosh Bapat <ashutosh(dot)bapat(at)enterprisedb(dot)com> |
Cc: | pgsql-hackers <pgsql-hackers(at)postgresql(dot)org> |
Subject: | Re: Getting sorted data from foreign server |
Date: | 2015-10-16 18:03:57 |
Message-ID: | CA+TgmoYbO2TZ3JQVdrsLUXjc1YWeZEGETw3Q1cvn4GnbY81Ymw@mail.gmail.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
On Thu, Oct 15, 2015 at 6:28 AM, Ashutosh Bapat
<ashutosh(dot)bapat(at)enterprisedb(dot)com> wrote:
> Attached is the patch which takes care of above comments.
I spent some time on this patch today. But it's still not right.
I've attached a new version which fixes a serious problem with your
last version - having postgresGetForeignPaths do the costing of the
sorted path itself instead of delegating that to
estimate_path_cost_size is wrong. In your version, 10% increment gets
applied to the network transmission costs as well as the cost of
generating the tupes - but only when use_remote_estimate == false. I
fixed this and did some cosmetic cleanup.
But you'll notice if you try this some of postgres_fdw's regression
tests fail. This is rather mysterious:
***************
*** 697,715 ****
Sort
Output: t1.c1, t1.c2, t1.c3, t1.c4, t1.c5, t1.c6, t1.c7, t1.c8
Sort Key: t1.c1
! -> Nested Loop Semi Join
Output: t1.c1, t1.c2, t1.c3, t1.c4, t1.c5, t1.c6, t1.c7, t1.c8
! Join Filter: (t1.c3 = t2.c3)
-> Foreign Scan on public.ft1 t1
Output: t1.c1, t1.c2, t1.c3, t1.c4, t1.c5, t1.c6, t1.c7, t1.c8
Remote SQL: SELECT "C 1", c2, c3, c4, c5, c6, c7, c8
FROM "S 1"."T 1" WHERE (("C 1" < 20))
! -> Materialize
Output: t2.c3
! -> Foreign Scan on public.ft2 t2
Output: t2.c3
! Filter: (date(t2.c4) = '01-17-1970'::date)
! Remote SQL: SELECT c3, c4 FROM "S 1"."T 1"
WHERE (("C 1" > 10))
! (15 rows)
EXECUTE st2(10, 20);
c1 | c2 | c3 | c4 | c5
| c6 | c7 | c8
--- 697,718 ----
Sort
Output: t1.c1, t1.c2, t1.c3, t1.c4, t1.c5, t1.c6, t1.c7, t1.c8
Sort Key: t1.c1
! -> Hash Join
Output: t1.c1, t1.c2, t1.c3, t1.c4, t1.c5, t1.c6, t1.c7, t1.c8
! Hash Cond: (t1.c3 = t2.c3)
-> Foreign Scan on public.ft1 t1
Output: t1.c1, t1.c2, t1.c3, t1.c4, t1.c5, t1.c6, t1.c7, t1.c8
Remote SQL: SELECT "C 1", c2, c3, c4, c5, c6, c7, c8
FROM "S 1"."T 1" WHERE (("C 1" < 20))
! -> Hash
Output: t2.c3
! -> HashAggregate
Output: t2.c3
! Group Key: t2.c3
! -> Foreign Scan on public.ft2 t2
! Output: t2.c3
! Filter: (date(t2.c4) = '01-17-1970'::date)
! Remote SQL: SELECT c3, c4 FROM "S 1"."T
1" WHERE (("C 1" > 10))
! (18 rows)
What I think is happening here is that the planner notices that
instead of doing a parameterized nestloop, it could pull down the data
already sorted from the remote side, cheaply unique-ify it by using
the ordering provided by the remote side, and then do a standard hash
join. That might well be a sensible approach, but the ORDER BY that
would make it correct doesn't show up in the Remote SQL. I don't know
why that's happening, but it's not good.
--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company
Attachment | Content-Type | Size |
---|---|---|
pg_sort_pd_v4.patch | application/x-patch | 16.1 KB |
From | Date | Subject | |
---|---|---|---|
Next Message | Andrew Dunstan | 2015-10-16 18:19:16 | Re: buildfarm failures on crake and sittella |
Previous Message | Jinyu Zhang | 2015-10-16 18:00:13 | Re: Patch: Optimize memory allocation in function 'bringetbitmap' |