Re: How to run in parallel in Postgres, EXECUTE_PARALLEL

From: Lars Aksel Opsahl <Lars(dot)Opsahl(at)nibio(dot)no>
To: Joe Conway <mail(at)joeconway(dot)com>, Laurenz Albe <laurenz(dot)albe(at)cybertec(dot)at>, "pgsql-performance(at)lists(dot)postgresql(dot)org" <pgsql-performance(at)lists(dot)postgresql(dot)org>
Subject: Re: How to run in parallel in Postgres, EXECUTE_PARALLEL
Date: 2019-12-08 21:59:51
Message-ID: HE1P189MB026617A76AEAF06FDADA20619D590@HE1P189MB0266.EURP189.PROD.OUTLOOK.COM
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-performance

> From: Joe Conway

> Sent: Sunday, December 8, 2019 9:04 PM

> To: Lars Aksel Opsahl; Laurenz Albe; pgsql-performance(at)lists(dot)postgresql(dot)org

> Subject: Re: How to run in parallel in Postgres, EXECUTE_PARALLEL

>

> On 12/8/19 1:14 PM, Lars Aksel Opsahl wrote:

> > Do you or anybody know if there are any plans for a function call that

> > support the calling structure below or something like it and that then

> > could finish in 1 second ? (If you are calling a void function, the

> > return value should not be any problem.)

> >

> > DO

> > $body$

> > *DECLARE*

> > command_string_list text[3];

> > *BEGIN*

> > command_string_list[0] = 'SELECT pg_sleep(1)';

> > command_string_list[1] = 'SELECT pg_sleep(1)';

> > command_string_list[2] = 'SELECT pg_sleep(1)';

> > EXECUTE_PARALLEL command_string_list;

> > *END*

> > $body$;

> >

> > The only way to this today as I understand it, is to open 3 new

> > connections back to the database which you can be done in different ways.

>

> Yes, correct.

>

> > If we had a parallel functions like the one above it's easier to

> > make parallel sql without using complex scripts, java, python or other

> > system.

>

> It does require one connection per statement, but with dblink it is not

> necessarily all that complex. For example (granted, this could use more

> error checking, etc.):

>

> 8<----------------

> CREATE OR REPLACE FUNCTION

> execute_parallel(stmts text[])

> RETURNS text AS

> $$

> declare

> i int;

> retv text;

> conn text;

> connstr text;

> rv int;

> db text := current_database();

> begin

> for i in 1..array_length(stmts,1) loop

> conn := 'conn' || i::text;

> connstr := 'dbname=' || db;

> perform dblink_connect(conn, connstr);

> rv := dblink_send_query(conn, stmts[i]);

> end loop;

> for i in 1..array_length(stmts,1) loop

> conn := 'conn' || i::text;

> select val into retv

> from dblink_get_result(conn) as d(val text);

> end loop;

> for i in 1..array_length(stmts,1) loop

> conn := 'conn' || i::text;

> perform dblink_disconnect(conn);

> end loop;

> return 'OK';

> end;

> $$ language plpgsql;

> 8<----------------

>

> And then:

>

> 8<----------------

> \timing

> DO $$

> declare

> stmts text[];

> begin

> stmts[1] = 'select pg_sleep(1)';

> stmts[2] = 'select pg_sleep(1)';

> stmts[3] = 'select pg_sleep(1)';

> PERFORM execute_parallel(stmts);

> end;

> $$ LANGUAGE plpgsql;

> DO

> Time: 1010.831 ms (00:01.011)

> 8<----------------

>

> HTH,

>

> Joe

> --

> Crunchy Data - http://crunchydata.com

> PostgreSQL Support for Secure Enterprises

> Consulting, Training, & Open Source Development

Hi

Thanks a lot it works like a charm. https://github.com/larsop/find-overlap-and-gap/tree/use_dblink_for_parallel
(The test is failing now because it seems like drop EXTENSION dblink; is not cleaning up every thing)

As you say we need some error handling. And maybe some retry if not enough free connections and a parameter for max parallel connections and so on.

So far this is best solution I have seen.

Thanks.

Lars

Browse pgsql-performance by date

  From Date Subject
Next Message Justin Pryzby 2019-12-09 04:13:07 Re: Specific query taking time to process
Previous Message Joe Conway 2019-12-08 20:04:05 Re: How to run in parallel in Postgres, EXECUTE_PARALLEL