Re: Parallel INSERT (INTO ... SELECT ...)

From: Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>
To: "tsunakawa(dot)takay(at)fujitsu(dot)com" <tsunakawa(dot)takay(at)fujitsu(dot)com>
Cc: Greg Nancarrow <gregn4422(at)gmail(dot)com>, "tanghy(dot)fnst(at)cn(dot)fujitsu(dot)com" <tanghy(dot)fnst(at)cn(dot)fujitsu(dot)com>, Dilip Kumar <dilipbalaut(at)gmail(dot)com>, vignesh C <vignesh21(at)gmail(dot)com>, Bharath Rupireddy <bharath(dot)rupireddyforpostgres(at)gmail(dot)com>, Andres Freund <andres(at)anarazel(dot)de>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>, Antonin Houska <ah(at)cybertec(dot)at>
Subject: Re: Parallel INSERT (INTO ... SELECT ...)
Date: 2021-02-04 10:46:18
Message-ID: CAA4eK1+q0-greWrcgpsHU-QJZg7EGBDYcsnL9cqbn81A8V_Mpg@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Thu, Feb 4, 2021 at 6:26 AM tsunakawa(dot)takay(at)fujitsu(dot)com
<tsunakawa(dot)takay(at)fujitsu(dot)com> wrote:
>
> From: Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>
> > On Mon, Jan 18, 2021 at 2:40 PM Tang, Haiying
> > <tanghy(dot)fnst(at)cn(dot)fujitsu(dot)com> wrote:
> > > Execute EXPLAIN on Patched:
> > > postgres=# explain (ANALYZE, BUFFERS, VERBOSE) insert into test_part
> > select * from test_data1;
> > > QUERY PLAN
> > >
> > ---------------------------------------------------------------------------
> > ---------------------------------------------
> > > Insert on public.test_part (cost=0.00..15.00 rows=0 width=0) (actual
> > time=44.139..44.140 rows=0 loops=1)
> > > Buffers: shared hit=1005 read=1000 dirtied=3000 written=2000
> > > -> Seq Scan on public.test_data1 (cost=0.00..15.00 rows=1000
> > width=8) (actual time=0.007..0.201 rows=1000 loops=1)
> > > Output: test_data1.a, test_data1.b
> > > Buffers: shared hit=5
> > > Planning:
> > > Buffers: shared hit=27011
> > > Planning Time: 24.526 ms
> > > Execution Time: 44.981 ms
> > >
> > > Execute EXPLAIN on non-Patched:
> > > postgres=# explain (ANALYZE, BUFFERS, VERBOSE) insert into test_part
> > select * from test_data1;
> > > QUERY PLAN
> > >
> > ---------------------------------------------------------------------------
> > ---------------------------------------------
> > > Insert on public.test_part (cost=0.00..15.00 rows=0 width=0) (actual
> > time=72.656..72.657 rows=0 loops=1)
> > > Buffers: shared hit=22075 read=1000 dirtied=3000 written=2000
> > > -> Seq Scan on public.test_data1 (cost=0.00..15.00 rows=1000
> > width=8) (actual time=0.010..0.175 rows=1000 loops=1)
> > > Output: test_data1.a, test_data1.b
> > > Buffers: shared hit=5
> > > Planning:
> > > Buffers: shared hit=72
> > > Planning Time: 0.135 ms
> > > Execution Time: 79.058 ms
> > >
> >
> > So, the results indicate that after the patch we touch more buffers
> > during planning which I think is because of accessing the partition
> > information, and during execution, the patch touches fewer buffers for
> > the same reason. But why this can reduce the time with patch? I think
> > this needs some investigation.
>
> I guess another factor other than shared buffers is relcache and catcache. The patched version loads those cached entries for all partitions of the insert target table during the parallel-safety check in planning, while the unpatched version has to gradually build those cache entries during execution.
>

Right.

> How can wee confirm its effect?
>

I am not sure but if your theory is correct then won't in consecutive
runs both should have the same performance?

--
With Regards,
Amit Kapila.

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message David Rowley 2021-02-04 10:51:39 Re: Tid scan improvements
Previous Message Amit Kapila 2021-02-04 10:30:23 Re: logical replication worker accesses catalogs in error context callback