From: | "Constantin S(dot) Pan" <kvapen(at)gmail(dot)com> |
---|---|
To: | Amit Kapila <amit(dot)kapila16(at)gmail(dot)com> |
Cc: | Pgsql Hackers <pgsql-hackers(at)postgresql(dot)org> |
Subject: | Re: [WIP] speeding up GIN build with parallel workers |
Date: | 2016-03-17 09:26:03 |
Message-ID: | 20160317122603.52337566@ppg |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
On Thu, 17 Mar 2016 13:21:32 +0530
Amit Kapila <amit(dot)kapila16(at)gmail(dot)com> wrote:
> On Wed, Mar 16, 2016 at 7:50 PM, Constantin S. Pan <kvapen(at)gmail(dot)com>
> wrote:
> >
> > On Wed, 16 Mar 2016 18:08:38 +0530
> > Amit Kapila <amit(dot)kapila16(at)gmail(dot)com> wrote:
> >
> > >
> > > Why backend just waits, why can't it does the same work as any
> > > worker does? In general, for other parallelism features the
> > > backend also behaves the same way as worker in producing the
> > > results if the results from workers is not available.
> >
> > We can make backend do the same work as any worker, but that
> > will complicate the code for less than 2 % perfomance boost.
>
> Why do you think it will be just 2%? I think for single worker case,
> it should be much more as the master backend will be less busy in
> consuming tuples from tuple queue. I can't say much about
> code-complexity, as I haven't yet looked carefully at the logic of
> patch, but we didn't find much difficulty while doing it for parallel
> scans. One of the commit which might help you in understanding how
> currently heap scans are parallelised is
> ee7ca559fcf404f9a3bd99da85c8f4ea9fbc2e92, you can see if that can
> help you in anyway for writing a generic API for Gin parallel builds.
I looked at the timing details some time ago, which showed
that the backend spent about 1% of total time on data
transfer from 1 worker, and 3% on transfer and merging from
2 workers. So if we use (active backend + 1 worker) instead
of (passive backend + 2 workers), we still have to spend
1.5% on transfer and merging.
Or we can look at these measurements (from yesterday's
message):
wnum mem(MB) time(s)
0 16 247
1 16 256
2 16 126
If 2 workers didn't have to transfer and merge their
results, they would have finished in 247 / 2 = 123.5
seconds. But the transfer and merging took another 2.5
seconds. The merging takes a little longer than the
transfer. If we now use backend+worker we get rid of 1
transfer, but still have to do 1 transfer and then merge, so
we will save less than a quarter of those 2.5 seconds.
In other words, we gain almost nothing by teaching the
backend how to be a worker.
Regards,
Constantin S. Pan
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company
From | Date | Subject | |
---|---|---|---|
Next Message | Aleksander Alekseev | 2016-03-17 09:38:53 | Re: Small patch: fix comments in contrib/pg_trgm/ |
Previous Message | Dean Rasheed | 2016-03-17 08:49:18 | Re: Re: Add generate_series(date,date) and generate_series(date,date,integer) |