From: | Bharath Rupireddy <bharath(dot)rupireddyforpostgres(at)gmail(dot)com> |
---|---|
To: | Amit Kapila <amit(dot)kapila16(at)gmail(dot)com> |
Cc: | PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org> |
Subject: | Re: Parallel copy |
Date: | 2020-10-09 10:20:07 |
Message-ID: | CALj2ACXkxRYW77Vb+463FGHrGcbyNy0yW9JZcFuy15a3NCVaRA@mail.gmail.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
On Fri, Oct 9, 2020 at 3:26 PM Amit Kapila <amit(dot)kapila16(at)gmail(dot)com> wrote:
>
> On Fri, Oct 9, 2020 at 2:52 PM Bharath Rupireddy
> <bharath(dot)rupireddyforpostgres(at)gmail(dot)com> wrote:
> >
> > On Tue, Sep 29, 2020 at 6:30 PM Amit Kapila <amit(dot)kapila16(at)gmail(dot)com> wrote:
> > >
> > > From the testing perspective,
> > > 1. Test by having something force_parallel_mode = regress which means
> > > that all existing Copy tests in the regression will be executed via
> > > new worker code. You can have this as a test-only patch for now and
> > > make sure all existing tests passed with this.
> > >
> >
> > I don't think all the existing copy test cases(except the new test cases added in the parallel copy patch set) would run inside the parallel worker if force_parallel_mode is on. This is because, the parallelism will be picked up for parallel copy only if parallel option is specified unlike parallelism for select queries.
> >
>
> Sure, you need to change the code such that when force_parallel_mode =
> 'regress' is specified then it always uses one worker. This is
> primarily for testing purposes and will help during the development of
> this patch as it will make all exiting Copy tests to use quite a good
> portion of the parallel infrastructure.
>
IIUC, firstly, I will set force_parallel_mode = FORCE_PARALLEL_REGRESS
as default value in guc.c, and then adjust the parallelism related
code in copy.c such that it always picks 1 worker and spawns it. This
way, all the existing copy test cases would be run in parallel worker.
Please let me know if this is okay. If yes, I will do this and update
here.
>
> > All the above tests are performed on the latest v6 patch set (attached here in this thread) with custom postgresql.conf[1]. The results are of the triplet form (exec time in sec, number of workers, gain)
> >
>
> Okay, so I am assuming the performance is the same as we have seen
> with the earlier versions of patches.
>
Yes. Most recent run on v5 patch set [1]
>
> > Overall, we have below test cases to cover the code and for performance measurements. We plan to run these tests whenever a new set of patches is posted.
> >
> > 1. csv
> > 2. binary
>
> Don't we need the tests for plain text files as well?
>
Will add one.
>
> > 3. force parallel mode = regress
> > 4. toast data csv and binary
> > 5. foreign key check, before row, after row, before statement, after statement, instead of triggers
> > 6. partition case
> > 7. foreign partitions and partitions having trigger cases
> > 8. where clause having parallel unsafe and safe expression, default parallel unsafe and safe expression
> > 9. temp, global, local, unlogged, inherited tables cases, foreign tables
> >
>
> Sounds like good coverage. So, are you doing all this testing
> manually? How are you maintaining these tests?
>
Yes, running them manually. Few of the tests(1,2,4) require huge
datasets for performance measurements and other test cases are to
ensure we don't choose parallelism. We will try to add test cases that
are not meant for performance, to the patch test.
With Regards,
Bharath Rupireddy.
EnterpriseDB: http://www.enterprisedb.com
From | Date | Subject | |
---|---|---|---|
Next Message | Amit Kapila | 2020-10-09 10:57:19 | Re: Parallel copy |
Previous Message | Greg Nancarrow | 2020-10-09 10:20:01 | Re: Parallel INSERT (INTO ... SELECT ...) |