Re: Parallel INSERT (INTO ... SELECT ...)

From: Greg Nancarrow <gregn4422(at)gmail(dot)com>
To: "tsunakawa(dot)takay(at)fujitsu(dot)com" <tsunakawa(dot)takay(at)fujitsu(dot)com>
Cc: Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>, "tanghy(dot)fnst(at)cn(dot)fujitsu(dot)com" <tanghy(dot)fnst(at)cn(dot)fujitsu(dot)com>, Dilip Kumar <dilipbalaut(at)gmail(dot)com>, vignesh C <vignesh21(at)gmail(dot)com>, Bharath Rupireddy <bharath(dot)rupireddyforpostgres(at)gmail(dot)com>, Andres Freund <andres(at)anarazel(dot)de>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>, Antonin Houska <ah(at)cybertec(dot)at>
Subject: Re: Parallel INSERT (INTO ... SELECT ...)
Date: 2021-02-04 12:05:17
Message-ID: CAJcOf-fb3ZZf+bx-EwKREcOASksk5Tqw0ncxtYr4Pq+M=wjJmA@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Thu, Feb 4, 2021 at 11:56 AM tsunakawa(dot)takay(at)fujitsu(dot)com
<tsunakawa(dot)takay(at)fujitsu(dot)com> wrote:
>
> >
> > So, the results indicate that after the patch we touch more buffers
> > during planning which I think is because of accessing the partition
> > information, and during execution, the patch touches fewer buffers for
> > the same reason. But why this can reduce the time with patch? I think
> > this needs some investigation.
>
> I guess another factor other than shared buffers is relcache and catcache. The patched version loads those cached entries for all partitions of the insert target table during the parallel-safety check in planning, while the unpatched version has to gradually build those cache entries during execution. How can wee confirm its effect?
>

I believe that we can confirm its effect by invalidating relcache and
catcache, in both the patched and unpatched versions, just after the
parallel-safety checks are performed in the planner, and then running
tests and comparing the performance.

So that's exactly what I did (adding a call to
InvalidateSystemCaches() just after the parallel-safety checks in the
planner).
I found that then the unpatched version always performed better than
the patched version for tests inserting 1000 records into a table with
100,200,500 and 1000 partitions.
Looking at the breakdown of the timing for each Insert, the Planning
Time was always significantly more for the patched version (expected,
because it does extra checks), but the Execution Time was very similar
for both the patched and unpatched versions.

Regards,
Greg Nancarrow
Fujitsu Australia

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Dilip Kumar 2021-02-04 12:08:21 Re: Is Recovery actually paused?
Previous Message Amit Kapila 2021-02-04 11:33:37 Re: DROP TABLE can crash the replication sync worker