From: | vignesh C <vignesh21(at)gmail(dot)com> |
---|---|
To: | Ashutosh Sharma <ashu(dot)coek88(at)gmail(dot)com> |
Cc: | Greg Nancarrow <gregn4422(at)gmail(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>, Bharath Rupireddy <bharath(dot)rupireddyforpostgres(at)gmail(dot)com> |
Subject: | Re: Parallel copy |
Date: | 2020-09-22 09:14:21 |
Message-ID: | CALDaNm2rRNnLoC0g9z9aDzpz1FoAanv5u00ri1d+E8Y3of5rxg@mail.gmail.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
Thanks Ashutosh for your comments.
On Wed, Sep 16, 2020 at 6:36 PM Ashutosh Sharma <ashu(dot)coek88(at)gmail(dot)com> wrote:
>
> Hi Vignesh,
>
> I've spent some time today looking at your new set of patches and I've
> some thoughts and queries which I would like to put here:
>
> Why are these not part of the shared cstate structure?
>
> SerializeString(pcxt, PARALLEL_COPY_KEY_NULL_PRINT, cstate->null_print);
> SerializeString(pcxt, PARALLEL_COPY_KEY_DELIM, cstate->delim);
> SerializeString(pcxt, PARALLEL_COPY_KEY_QUOTE, cstate->quote);
> SerializeString(pcxt, PARALLEL_COPY_KEY_ESCAPE, cstate->escape);
>
I have used shared_cstate mainly to share the integer & bool data
types from the leader to worker process. The above data types are of
char* data type, I will not be able to use it like how I could do it
for integer type. So I preferred to send these as separate keys to the
worker. Thoughts?
> I think in the refactoring patch we could replace all the cstate
> variables that would be shared between the leader and workers with a
> common structure which would be used even for a serial copy. Thoughts?
>
Currently we are using shared_cstate only to share integer & bool data
types from leader to worker. Once worker retrieves the shared data for
integer & bool data types, worker will copy it to cstate. I preferred
this way because only for integer & bool we retrieve to shared_cstate
& copy it to cstate and for rest of the members any way we are
directly copying back to cstate. Thoughts?
> Have you tested your patch when encoding conversion is needed? If so,
> could you please point out the email that has the test results.
>
We have not yet done encoding testing, we will do and post the results
separately in the coming days.
> Apart from above, I've noticed some cosmetic errors which I am sharing here:
>
> +#define IsParallelCopy() (cstate->is_parallel)
> +#define IsLeader() (cstate->pcdata->is_leader)
>
> This doesn't look to be properly aligned.
>
Fixed.
> + shared_info_ptr = (ParallelCopyShmInfo *)
> shm_toc_allocate(pcxt->toc, sizeof(ParallelCopyShmInfo));
> + PopulateParallelCopyShmInfo(shared_info_ptr, full_transaction_id);
>
> ..
>
> + /* Store shared build state, for which we reserved space. */
> + shared_cstate = (SerializedParallelCopyState
> *)shm_toc_allocate(pcxt->toc, est_cstateshared);
>
> In the first case, while typecasting you've added a space between the
> typename and the function but that is missing in the second case. I
> think it would be good if you could make it consistent.
>
Fixed
> Same comment applies here as well:
>
> + pg_atomic_uint32 line_state; /* line state */
> + uint64 cur_lineno; /* line number for error messages */
> +}ParallelCopyLineBoundary;
>
> ...
>
> + CommandId mycid; /* command id */
> + ParallelCopyLineBoundaries line_boundaries; /* line array */
> +} ParallelCopyShmInfo;
>
> There is no space between the closing brace and the structure name in
> the first case but it is in the second one. So, again this doesn't
> look consistent.
>
Fixed
> I could also find this type of inconsistency in comments. See below:
>
> +/* It can hold upto 10000 record information for worker to process. RINGSIZE
> + * should be a multiple of WORKER_CHUNK_COUNT, as wrap around cases
> is currently
> + * not handled while selecting the WORKER_CHUNK_COUNT by the worker. */
> +#define RINGSIZE (10 * 1000)
>
> ...
>
> +/*
> + * Each worker will be allocated WORKER_CHUNK_COUNT of records from DSM data
> + * block to process to avoid lock contention. Read RINGSIZE comments before
> + * changing this value.
> + */
> +#define WORKER_CHUNK_COUNT 50
>
> You may see these kinds of errors at other places as well if you scan
> through your patch.
Fixed.
Please find the attached v5 patch which has the fixes for the same.
Thoughts?
Regards,
Vignesh
EnterpriseDB: http://www.enterprisedb.com
Attachment | Content-Type | Size |
---|---|---|
v5-0001-Copy-code-readjustment-to-support-parallel-copy.patch | text/x-patch | 16.6 KB |
v5-0002-Framework-for-leader-worker-in-parallel-copy.patch | text/x-patch | 31.0 KB |
v5-0003-Allow-copy-from-command-to-process-data-from-file.patch | text/x-patch | 43.0 KB |
v5-0004-Documentation-for-parallel-copy.patch | text/x-patch | 2.0 KB |
v5-0005-Tests-for-parallel-copy.patch | text/x-patch | 19.7 KB |
v5-0006-Parallel-Copy-For-Binary-Format-Files.patch | text/x-patch | 26.9 KB |
From | Date | Subject | |
---|---|---|---|
Next Message | Peter Eisentraut | 2020-09-22 09:37:57 | Re: OpenSSL 3.0.0 compatibility |
Previous Message | Christoph Berg | 2020-09-22 08:57:53 | Re: Binaries on s390x arch |