From: | Kyotaro HORIGUCHI <horiguchi(dot)kyotaro(at)lab(dot)ntt(dot)co(dot)jp> |
---|---|
To: | robertmhaas(at)gmail(dot)com |
Cc: | hlinnaka(at)iki(dot)fi, pgsql-hackers(at)postgresql(dot)org |
Subject: | Re: Asynchronous execution on FDW |
Date: | 2015-07-22 07:10:17 |
Message-ID: | 20150722.161017.153211073.horiguchi.kyotaro@lab.ntt.co.jp |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
Hello, thank you for the comment.
At Fri, 17 Jul 2015 14:34:53 -0400, Robert Haas <robertmhaas(at)gmail(dot)com> wrote in <CA+TgmoaiJK1svzw_GkFU+zsSxciJKFELqu2AOMVUPhpSFw4BsQ(at)mail(dot)gmail(dot)com>
> On Fri, Jul 3, 2015 at 4:41 PM, Heikki Linnakangas <hlinnaka(at)iki(dot)fi> wrote:
> > At a quick glance, I think this has all the same problems as starting the
> > execution at ExecInit phase. The correct way to do this is to kick off the
> > queries in the first IterateForeignScan() call. You said that "ExecProc
> > phase does not fit" - why not?
>
> What exactly are those problems?
>
> I can think of these:
>
> 1. If the scan is parametrized, we probably can't do it for lack of
> knowledge of what they will be. This seems easy; just don't do it in
> that case.
We can put an early kick to foreign scans only for the first shot
if we do it outside (before) ExecProc phase.
Nestloop
-> SeqScan
-> Append
-> Foreign (Index) Scan
-> Foreign (Index) Scan
..
This plan premises precise (even to some extent) estimate for
remote query but async execution within ExecProc phase would be
in effect for this case.
> 2. It's possible that we're down inside some subtree of the plan that
> won't actually get executed. This is trickier.
As for current postgres_fdw, it is done simply abandoning queued
result then close the cursor.
> Consider this:
>
> Append
> -> Foreign Scan
> -> Foreign Scan
> -> Foreign Scan
> <repeat 17 more times>
>
> If we don't start each foreign scan until the first tuple is fetched,
> we will not get any benefit here, because we won't fetch the first
> tuple from query #2 until we finish reading the results of query #1.
> If the result of the Append node will be needed in its entirety, we
> really, really want to launch of those queries as early as possible.
> OTOH, if there's a Limit node with a small limit on top of the Append
> node, that could be quite wasteful.
It's the nature of speculative execution, but the Limit will be
pushed down onto every Foreign Scans near future.
> We could decide not to care: after all, if our limit is
> satisfied, we can just bang the remote connections shut, and if
> they wasted some CPU, well, tough luck for them. But it would
> be nice to be smarter. I'm not sure how, though.
Appropriate fetch size will cap the harm and the case will be
handled as I mentioned above as for postgres_fdw.
regards,
--
Kyotaro Horiguchi
NTT Open Source Software Center
From | Date | Subject | |
---|---|---|---|
Next Message | Alexander Korotkov | 2015-07-22 07:28:35 | Re: Use pg_rewind when target timeline was switched |
Previous Message | Simon Riggs | 2015-07-22 07:02:02 | Re: [PROPOSAL] VACUUM Progress Checker. |