Re: Custom Scan APIs (Re: Custom Plan node)

From: Ashutosh Bapat <ashutosh(dot)bapat(at)enterprisedb(dot)com>
To: Stephen Frost <sfrost(at)snowman(dot)net>
Cc: Robert Haas <robertmhaas(at)gmail(dot)com>, Kouhei Kaigai <kaigai(at)ak(dot)jp(dot)nec(dot)com>, Kohei KaiGai <kaigai(at)kaigai(dot)gr(dot)jp>, Shigeru Hanada <shigeru(dot)hanada(at)gmail(dot)com>, Jim Mlodgenski <jimmy76(at)gmail(dot)com>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, PgHacker <pgsql-hackers(at)postgresql(dot)org>, Peter Eisentraut <peter_e(at)gmx(dot)net>
Subject: Re: Custom Scan APIs (Re: Custom Plan node)
Date: 2014-03-04 04:11:25
Message-ID: CAFjFpRcZOfLLp6AhoD5MTMmwekByPhaGp1zAW20ZoFzgVin4Tw@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Mon, Mar 3, 2014 at 9:13 PM, Stephen Frost <sfrost(at)snowman(dot)net> wrote:

> * Robert Haas (robertmhaas(at)gmail(dot)com) wrote:
> > On Sat, Mar 1, 2014 at 9:04 PM, Stephen Frost <sfrost(at)snowman(dot)net>
> wrote:
> > > Erm, my thought was to use a select() loop which sends out I/O requests
> > > and then loops around waiting to see who finishes it. It doesn't
> > > parallelize the CPU cost of getting the rows back to the caller, but
> > > it'd at least parallelize the I/O, and if what's underneath is actually
> > > a remote FDW running a complex query (because the other side is
> actually
> > > a view), it would be a massive win to have all the remote FDWs
> executing
> > > concurrently instead of serially as we have today.
> >
> > I can't really make sense of this.
>
> Sorry, that was a bit hand-wavey since I had posted about it previously
> here:
>
> http://www.postgresql.org/message-id/20131104032604.GB2706@tamriel.snowman.net
>
> It'd clearly be more involved than "just build a select() loop" and
> would require adding an async mechanism. I had been thinking about this
> primairly with the idea of FDWs and you're right that it'd require more
> thought to deal with getting data into/through shared_buffers. Still,
> we seqscan into a ring buffer, I'd think we could make it work but it
> would require additional work.
>
> > For FDWs, one idea might be to kick off the remote query at
> > ExecInitNode() time rather than ExecProcNode() time, at least if the
> > remote query doesn't depend on parameters that aren't available until
> > run time.
>
> Right, I had speculated about that also (option #2 in my earlier email).
>
>
During EXPLAIN, ExecInitNode() is called. If ExecInitNode() fires queries
to foreign servers, those would be fired while EXPLAINing a query as well.
We want to avoid that. Instead, we can run EXPLAIN on that query at foreign
server. But again, not all foreign servers would be able to EXPLAIN the
query e.g. file_fdw. OR totally avoid firing query during ExecInitNode(),
if it's for EXPLAIN (except for ANALYSE may be).

> > That actually would allow multiple remote queries to run
> > simultaneously or in parallel with local work. It would also run them
> > in cases where the relevant plan node is never executed, which would
> > be bad but perhaps rare enough not to worry about.
>
> This was my primary concern, along with the fact that we explicitly says
> "don't do that" in the docs for the FDW API.
>
> > Or we could add a
> > new API like ExecPrefetchNode() that tells nodes to prepare to have
> > tuples pulled, and they can do things like kick off asynchronous
> > queries. But I still don't see any clean way for the Append node to
> > find out which one is ready to return results first.
>
> Yeah, that's tricky.
>
> Thanks,
>
> Stephen
>

--
Best Wishes,
Ashutosh Bapat
EnterpriseDB Corporation
The Postgres Database Company

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Amit Kapila 2014-03-04 04:18:38 Re: Patch: show relation and tuple infos of a lock to acquire
Previous Message Fabrízio de Royes Mello 2014-03-04 04:10:50 Re: GSoC proposal - "make an unlogged table logged"