Re: How to run in parallel in Postgres

From: Lars Aksel Opsahl <Lars(dot)Opsahl(at)nibio(dot)no>
To: Justin Pryzby <pryzby(at)telsasoft(dot)com>
Cc: "pgsql-performance(at)lists(dot)postgresql(dot)org" <pgsql-performance(at)lists(dot)postgresql(dot)org>
Subject: Re: How to run in parallel in Postgres
Date: 2019-12-07 11:27:56
Message-ID: HE1P189MB0266944A7C861C8711453F7B9D5E0@HE1P189MB0266.EURP189.PROD.OUTLOOK.COM
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-performance

> From: Justin Pryzby <pryzby(at)telsasoft(dot)com>

> Sent: Saturday, December 7, 2019 2:25 AM

> To: Lars Aksel Opsahl <Lars(dot)Opsahl(at)nibio(dot)no>

> Cc: pgsql-performance(at)lists(dot)postgresql(dot)org <pgsql-performance(at)lists(dot)postgresql(dot)org>

> Subject: Re: How to run in parallel in Postgres

>

> On Thu, Dec 05, 2019 at 12:10:42PM +0000, Lars Aksel Opsahl wrote:

> > I have a function that prepares data, so the big job can be run it in parallel.

> >

> > Since Postgres now supports parallel I was wondering if it's easy to trigger parallel dynamically created SQL calls.

> >

> > If you look at https://github.com/larsop/find-overlap-and-gap/blob/master/src/test/sql/regress/find_overlap_and_gap.sql you see that

> >

> > find_overlap_gap_make_run_cmd generates as set of 28 sql calls.

> >

> > So is it in a simple way possible to use Postgres parallel functionality to call this 28 functions i parallel so I don't have dependent on externally install programs ?

>

> SELECT find_overlap_gap_single_cell('test_data.overlap_gap_input_t1','geom',4258,'test_data.overlap_gap_input_t1_res',1,28);

> SELECT find_overlap_gap_single_cell('test_data.overlap_gap_input_t1','geom',4258,'test_data.overlap_gap_input_t1_res',2,28);

> SELECT find_overlap_gap_single_cell('test_data.overlap_gap_input_t1','geom',4258,'test_data.overlap_gap_input_t1_res',3,28);

> ...

>

> I see that find_overlap_gap_single_cell creates tables, so cannot be run in parallel.

> Maybe you could consider rewriting it to return data to its caller instead.

> You'd also need to mark it as PARALLEL SAFE, of course.

> Your other functions involved should be PARALLEL SAFE too.

>

> Justin

Hi Justin

The reason why I don't return the results Is that on very bug tables I usually get memory problems if I return all the results to the master function. So I usually break thing up into small unlogged tables. Then I work on each table separately or in groups. When all steps are done i merge all the small tables together. I this case we only single step, but usually I work many more steps.

But I will keep mind that it may work i parallel if I don't create any child tables but returns the result.

Thanks.

Lars

In response to

Browse pgsql-performance by date

  From Date Subject
Next Message Jean-David Beyer 2019-12-07 12:10:15 Re: Legal disclaimers on emails to this group
Previous Message Lars Aksel Opsahl 2019-12-07 11:17:25 Re: How to run in parallel in Postgres