Re: [POC] Faster processing at Gather node

From: Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>
To: Robert Haas <robertmhaas(at)gmail(dot)com>
Cc: Rafia Sabih <rafia(dot)sabih(at)enterprisedb(dot)com>, PostgreSQL Developers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: [POC] Faster processing at Gather node
Date: 2017-05-20 02:18:28
Message-ID: CAA4eK1JExaqaRgT=UwiXqBCj8NhROJefDVO2BVwoNp6w4APYCA@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Fri, May 19, 2017 at 5:58 PM, Robert Haas <robertmhaas(at)gmail(dot)com> wrote:
> On Fri, May 19, 2017 at 7:55 AM, Rafia Sabih
> <rafia(dot)sabih(at)enterprisedb(dot)com> wrote:
>> While analysing the performance of TPC-H queries for the newly developed
>> parallel-operators, viz, parallel index, bitmap heap scan, etc. we noticed
>> that the time taken by gather node is significant. On investigation, as per
>> the current method it copies each tuple to the shared queue and notifies the
>> receiver. Since, this copying is done in shared queue, a lot of locking and
>> latching overhead is there.
>>
>> So, in this POC patch I tried to copy all the tuples in a local queue thus
>> avoiding all the locks and latches. Once, the local queue is filled as per
>> it's capacity, tuples are transferred to the shared queue. Once, all the
>> tuples are transferred the receiver is sent the notification about the same.
>
> What if, instead of doing this, we switched the shm_mq stuff to use atomics?
>

That is one of the very first things we have tried, but it didn't show
us any improvement, probably because sending tuple-by-tuple over
shm_mq is not cheap. Also, independently, we have tried to reduce the
frequency of SetLatch (used to notify receiver), but that also didn't
result in improving the results. Now, I think one thing that can be
tried is to use atomics in shm_mq and reduce the frequency to notify
receiver, but not sure if that can give us better results than with
this idea. There are a couple of other ideas which has been tried to
improve the speed of Gather like avoiding an extra copy of tuple which
we need to do before sending tuple
(tqueueReceiveSlot->ExecMaterializeSlot) and increasing the size of
tuple queue length, but none of those has shown any noticeable
improvement. I am aware of all this because I and Dilip were offlist
involved in brainstorming ideas with Rafia to improve the speed of
Gather. I think it might have been better to show the results of
ideas that didn't work out, but I guess Rafia hasn't shared those with
the intuition that nobody would be interested in hearing what didn't
work out.

--
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Robert Haas 2017-05-20 02:33:07 Re: Removal of plaintext password type references
Previous Message David Rowley 2017-05-20 01:59:43 Re: Regression in join selectivity estimations when using foreign keys