From: | vignesh C <vignesh21(at)gmail(dot)com> |
---|---|
To: | Bharath Rupireddy <bharath(dot)rupireddyforpostgres(at)gmail(dot)com> |
Cc: | PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org> |
Subject: | Re: Parallel copy |
Date: | 2020-10-27 13:36:15 |
Message-ID: | CALDaNm1cAONkFDN6K72DSiRpgqNGvwxQL7TjEiHZ58opnp9VoA@mail.gmail.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
Thanks for the comments, please find my thoughts below.
On Wed, Oct 21, 2020 at 3:19 PM Bharath Rupireddy
<bharath(dot)rupireddyforpostgres(at)gmail(dot)com> wrote:
>
> Hi Vignesh,
>
> I took a look at the v8 patch set. Here are some comments:
>
> 1. PopulateCommonCstateInfo() -- can we use PopulateCommonCStateInfo()
> or PopulateCopyStateInfo()? And also EstimateCstateSize() --
> EstimateCStateSize(), PopulateCstateCatalogInfo() --
> PopulateCStateCatalogInfo()?
>
Changed as suggested.
> 2. Instead of mentioning numbers like 1024, 64K, 10240 in the
> comments, can we represent them in terms of macros?
> /* It can hold 1024 blocks of 64K data in DSM to be processed by the worker. */
> #define MAX_BLOCKS_COUNT 1024
> /*
> * It can hold upto 10240 record information for worker to process. RINGSIZE
>
Changed as suggested.
> 3. How about
> "
> Each worker at once will pick the WORKER_CHUNK_COUNT records from the
> DSM data blocks and store them in it's local memory.
> This is to make workers not contend much while getting record
> information from the DSM. Read RINGSIZE comments before
> changing this value.
> "
> instead of
> /*
> * Each worker will be allocated WORKER_CHUNK_COUNT of records from DSM data
> * block to process to avoid lock contention. Read RINGSIZE comments before
> * changing this value.
> */
>
Rephrased it.
> 4. How about one line gap before and after for comments: "Leader
> should operate in the following order:" and "Worker should operate in
> the following order:"
>
Changed it.
> 5. Can we move RAW_BUF_BYTES macro definition to the beginning of the
> copy.h where all the macro are defined?
>
Change was done as part of another commit & we are using as it is. I
preferred it to be as it is.
> 6. I don't think we need the change in toast_internals.c with the
> temporary hack Assert(!(IsParallelWorker() && !currentCommandIdUsed));
> in GetCurrentCommandId()
>
Modified it.
> 7. I think
> /* Can't perform copy in parallel */
> if (parallel_workers <= 0)
> return NULL;
> can be
> /* Can't perform copy in parallel */
> if (parallel_workers == 0)
> return NULL;
> as parallel_workers can never be < 0 since we enter BeginParallelCopy
> only if cstate->nworkers > 0 and also we are not allowed to have
> negative values for max_worker_processes.
>
Modified it.
> 8. Do we want to pfree(cstate->pcdata) in case we failed to start any
> parallel workers, we would have allocated a good
> else
> {
> /*
> * Reset nworkers to -1 here. This is useful in cases where user
> * specifies parallel workers, but, no worker is picked up, so go
> * back to non parallel mode value of nworkers.
> */
> cstate->nworkers = -1;
> *processed = CopyFrom(cstate); /* copy from file to database */
> }
>
Added pfree.
> 9. Instead of calling CopyStringToSharedMemory() for each string
> variable, can't we just create a linked list of all the strings that
> need to be copied into shm and call CopyStringToSharedMemory() only
> once? We could avoid 5 function calls?
>
I feel keeping it this way makes the code more readable, and also this
is not in a performance intensive tight loop. I'm retaining the
change as is unless we feel this will make an impact.
> 10. Similar to above comment: can we fill all the required
> cstate->variables inside the function CopyNodeFromSharedMemory() and
> call it only once? In each worker we could save overhead of 5 function
> calls.
>
same as above.
> 11. Looks like CopyStringFromSharedMemory() and
> CopyNodeFromSharedMemory() do almost the same things except
> stringToNode() and pfree(destptr);. Can we have a generic function
> CopyFromSharedMemory() or something else and handle with flag "bool
> isnode" to differentiate the two use cases?
>
Removed CopyStringFromSharedMemory & used CopyNodeFromSharedMemory
appropriately. CopyNodeFromSharedMemory is renamed to
RestoreNodeFromSharedMemory keep the name consistent.
> 12. Can we move below check to the end in IsParallelCopyAllowed()?
> /* Check parallel safety of the trigger functions. */
> if (cstate->rel->trigdesc != NULL &&
> !CheckRelTrigFunParallelSafety(cstate->rel->trigdesc))
> return false;
>
Modified.
> 13. CacheLineInfo(): Instead of goto empty_data_line_update; how about
> having this directly inside the if block as it's being used only once?
>
Have removed the goto by using a macro.
> 14. GetWorkerLine(): How about avoiding goto statements and replacing
> the common code with a always static inline function or a macro?
>
Have removed the goto by using a macro.
> 15. UpdateSharedLineInfo(): Below line is misaligned.
> lineInfo->first_block = blk_pos;
> lineInfo->start_offset = offset;
>
Changed it.
> 16. ParallelCopyFrom(): Do we need CHECK_FOR_INTERRUPTS(); at the
> start of for (;;)?
>
Added it.
> 17. Remove extra lines after #define IsHeaderLine()
> (cstate->header_line && cstate->cur_lineno == 1) in copy.h
>
Modified it.
Attached v9 patches have the fixes for the above comments.
Regards,
Vignesh
EnterpriseDB: http://www.enterprisedb.com
Attachment | Content-Type | Size |
---|---|---|
v9-0001-Copy-code-readjustment-to-support-parallel-copy.patch | text/x-patch | 30.9 KB |
v9-0002-Allow-copy-from-command-to-process-data-from-file.patch | text/x-patch | 80.8 KB |
v9-0003-Documentation-for-parallel-copy.patch | text/x-patch | 2.7 KB |
v9-0004-Tests-for-parallel-copy.patch | text/x-patch | 34.5 KB |
v9-0005-Parallel-Copy-For-Binary-Format-Files.patch | text/x-patch | 23.8 KB |
From | Date | Subject | |
---|---|---|---|
Next Message | Peter Eisentraut | 2020-10-27 13:45:13 | Re: SQL-standard function body |
Previous Message | Heikki Linnakangas | 2020-10-27 13:23:00 | Re: partition routing layering in nodeModifyTable.c |