From: | Amit Kapila <amit(dot)kapila16(at)gmail(dot)com> |
---|---|
To: | Bharath Rupireddy <bharath(dot)rupireddyforpostgres(at)gmail(dot)com> |
Cc: | "Tang, Haiying" <tanghy(dot)fnst(at)cn(dot)fujitsu(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>, Zhihong Yu <zyu(at)yugabyte(dot)com>, Luc Vlaming <luc(at)swarm64(dot)com>, Dilip Kumar <dilipbalaut(at)gmail(dot)com>, vignesh C <vignesh21(at)gmail(dot)com>, "Hou, Zhijie" <houzj(dot)fnst(at)cn(dot)fujitsu(dot)com> |
Subject: | Re: Parallel Inserts in CREATE TABLE AS |
Date: | 2021-05-29 04:16:35 |
Message-ID: | CAA4eK1+skEe12C+7aDkd+D2UwuNy_OnVo2-r4_bJkNxSV3G0vQ@mail.gmail.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
On Fri, May 28, 2021 at 8:53 AM Amit Kapila <amit(dot)kapila16(at)gmail(dot)com> wrote:
>
> On Thu, May 27, 2021 at 7:37 PM Bharath Rupireddy
> >
> > I captured below information with the attached patch
> > 0001-test-times-and-block-counts.patch applied on top of CTAS v23
> > patch set. Testing details are attached in the file named "test".
> > Total time spent in LockRelationForExtension
> > Total time spent in GetPageWithFreeSpace
> > Total time spent in RelationAddExtraBlocks
> > Total number of times extended the relation in bulk
> > Total number of times extended the relation by one block
> > Total number of blocks added in bulk extension
> > Total number of times getting the page from FSM
> >
>
> In your results, the number of pages each process is getting from FSM
> is not matching with the number of blocks added. I think we need to
> increment 'fsm_hit_count' in RecordAndGetPageWithFreeSpace as well
> because that is also called and the process can get a free page via
> the same. The other thing to check via debugger is when one worker
> adds the blocks in bulk does another parallel worker gets all those
> blocks. You can achieve that by allowing one worker (say worker-1) to
> extend the relation in bulk and then let it wait and allow another
> worker (say worker-2) to proceed and see if it gets all the pages
> added by worker-1 from FSM. You need to keep the leader also waiting
> or not perform any operation.
>
While looking at results, I have observed one more thing that we are
trying to parallelize I/O due to which we might not be seeing benefit
in such cases. I think even for non-write queries there won't be any
(much) benefit if we can't parallelize CPU usage. Basically, the test
you are doing is for statement: explain analyze verbose create table
test as select * from tenk1;. Now, in this statement, there is no
qualification and still, the Gather node is generated for it, this
won't be the case if we check "select * from tenk1". Is it due to the
reason that the patch completely ignores the parallel_tuple_cost? But
still, it should prefer a serial plan due parallel_setup_cost, why is
that not happening? Anyway, I think we should not parallelize such
queries where we can't parallelize CPU usage. Have you tried the cases
without changing any of the costings for parallelism?
--
With Regards,
Amit Kapila.
From | Date | Subject | |
---|---|---|---|
Next Message | Amit Kapila | 2021-05-29 04:29:52 | Re: Decoding speculative insert with toast leaks memory |
Previous Message | Dean Gibson (DB Administrator) | 2021-05-29 04:08:28 | Re: AWS forcing PG upgrade from v9.6 a disaster |