Re: Append with naive multiplexing of FDWs

From: Bruce Momjian <bruce(at)momjian(dot)us>
To: Thomas Munro <thomas(dot)munro(at)gmail(dot)com>
Cc: pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>, Stephen Frost <sfrost(at)snowman(dot)net>
Subject: Re: Append with naive multiplexing of FDWs
Date: 2019-11-30 19:26:11
Message-ID: 20191130192611.GB4326@momjian.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Sun, Nov 17, 2019 at 09:54:55PM +1300, Thomas Munro wrote:
> On Sat, Sep 28, 2019 at 4:20 AM Bruce Momjian <bruce(at)momjian(dot)us> wrote:
> > On Wed, Sep 4, 2019 at 06:18:31PM +1200, Thomas Munro wrote:
> > > A few years back[1] I experimented with a simple readiness API that
> > > would allow Append to start emitting tuples from whichever Foreign
> > > Scan has data available, when working with FDW-based sharding. I used
> > > that primarily as a way to test Andres's new WaitEventSet stuff and my
> > > kqueue implementation of that, but I didn't pursue it seriously
> > > because I knew we wanted a more ambitious async executor rewrite and
> > > many people had ideas about that, with schedulers capable of jumping
> > > all over the tree etc.
> > >
> > > Anyway, Stephen Frost pinged me off-list to ask about that patch, and
> > > asked why we don't just do this naive thing until we have something
> > > better. It's a very localised feature that works only between Append
> > > and its immediate children. The patch makes it work for postgres_fdw,
> > > but it should work for any FDW that can get its hands on a socket.
> > >
> > > Here's a quick rebase of that old POC patch, along with a demo. Since
> > > 2016, Parallel Append landed, but I didn't have time to think about
> > > how to integrate with that so I did a quick "sledgehammer" rebase that
> > > disables itself if parallelism is in the picture.
> >
> > Yes, sharding has been waiting on parallel FDW scans. Would this work
> > for parallel partition scans if the partitions were FDWs?
>
> Yeah, this works for partitions that are FDWs (as shown), but only for
> Append, not for Parallel Append. So you'd have parallelism in the
> sense that your N remote shard servers are all doing stuff at the same
> time, but it couldn't be in a parallel query on your 'home' server,
> which is probably good for things that push down aggregation and bring
> back just a few tuples from each shard, but bad for anything wanting
> to ship back millions of tuples to chew on locally. Do you think
> that'd be useful enough on its own?

Yes, I think so. There are many data warehouse queries that want to
return only aggregate values, or filter for a small number of rows.
Even OLTP queries might return only a few rows from multiple partitions.
This would allow for a proof-of-concept implementation so we can see how
realistic this approach is.

> The problem is that parallel safe non-partial plans (like postgres_fdw
> scans) are exclusively 'claimed' by one process under Parallel Append,
> so with the patch as posted, if you modify it to allow parallelism
> then it'll probably give correct answers but nothing prevents a single
> process from claiming and starting all the scans and then waiting for
> them to be ready, while the other processes miss out on doing any work
> at all. There's probably some kludgy solution involving not letting
> any one worker start more than X, and some space cadet solution
> involving passing sockets around and teaching libpq to hand over
> connections at certain controlled phases of the protocol (due to lack
> of threads), but nothing like that has jumped out as the right path so
> far.

I am unclear how many queries can do any meaningful work until all
shards have giving their full results.

--
Bruce Momjian <bruce(at)momjian(dot)us> http://momjian.us
EnterpriseDB http://enterprisedb.com

+ As you are, so once was I. As I am, so you will be. +
+ Ancient Roman grave inscription +

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Fabien COELHO 2019-11-30 19:33:25 Re: pgbench -i progress output on terminal
Previous Message Mark Dilger 2019-11-30 18:04:07 Re: Make autovacuum sort tables in descending order of xid_age