From: | David Rowley <david(dot)rowley(at)2ndquadrant(dot)com> |
---|---|
To: | Brad DeJong <Brad(dot)Dejong(at)infor(dot)com> |
Cc: | Robert Haas <robertmhaas(at)gmail(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org> |
Subject: | Re: Small improvement to parallel query docs |
Date: | 2017-02-13 21:43:56 |
Message-ID: | CAKJS1f_1=kJGYR-VOAiMiS=zwWLT=wr8t8X0hiQ4NYSgG37Nhg@mail.gmail.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
On 14 February 2017 at 10:10, Brad DeJong <Brad(dot)Dejong(at)infor(dot)com> wrote:
> Robert Haas wrote:
>
>> + <literal>COUNT(*)</>, each worker must compute subtotals which later must
>> + be combined to produce an overall total in order to produce the final
>> + answer. If the query involves a <literal>GROUP BY</> clause,
>> + separate subtotals must be computed for each group seen by each parallel
>> + worker. Each of these subtotals must then be combined into an overall
>> + total for each group once the parallel aggregate portion of the plan is
>> + complete. This means that queries which produce a low number of groups
>> + relative to the number of input rows are often far more attractive to the
>> + query planner, whereas queries which don't collect many rows into each
>> + group are less attractive, due to the overhead of having to combine the
>> + subtotals into totals, of which cannot run in parallel.
>
>> I don't think "of which cannot run in parallel" is good grammar. I'm somewhat unsure whether the rest is an improvement or not. Other opinions?
>
> Does this read any more clearly?
>
> + <literal>COUNT(*)</>, each worker must compute subtotals which are later
> + combined in order to produce an overall total for the final answer. If
> + the query involves a <literal>GROUP BY</> clause, separate subtotals
> + must be computed for each group seen by each parallel worker. After the
> + parallel aggregate portion of the plan is complete, there is a serial step
> + where the group subtotals from all of the parallel workers are combined
> + into an overall total for each group. Because of the overhead of combining
> + the subtotals into totals, plans which produce few groups relative to the
> + number of input rows are often more attractive to the query planner
> + than plans which produce many groups relative to the number of input rows.
Actually looking over this again I think it's getting into too much
detail which is already described in the next paragraph (of which I
think is very clear). I propose we just remove the whole paragraph,
and mention about the planning and estimated number of groups stuff in
another new paragraph.
I've attached a patch to this effect, which also just removes the text
about why we don't support Merge Join. I felt something needed written
in its place, so I mentioned that identical hash tables are created in
each worker. This is perhaps not required, but the paragraph seemed a
bit empty without it. I also noticed a mistake "based on a column
taken from the inner table", this "inner" I assume should be "outer"
since it surely must be talking of a parameterised index scan?, in
which case the parameter is from the outer side, not the inner.
--
David Rowley http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services
Attachment | Content-Type | Size |
---|---|---|
parallel_doc_fixes_v2.patch | application/octet-stream | 5.0 KB |
From | Date | Subject | |
---|---|---|---|
Next Message | Brad DeJong | 2017-02-13 21:56:36 | Re: Small improvement to parallel query docs |
Previous Message | Craig Ringer | 2017-02-13 21:29:48 | Re: COPY IN/BOTH vs. extended query mode |