From: | Thomas Munro <thomas(dot)munro(at)enterprisedb(dot)com> |
---|---|
To: | Robert Haas <robertmhaas(at)gmail(dot)com> |
Cc: | Andres Freund <andres(at)anarazel(dot)de>, Rushabh Lathia <rushabh(dot)lathia(at)gmail(dot)com>, Prabhat Sahu <prabhat(dot)sahu(at)enterprisedb(dot)com>, Peter Geoghegan <pg(at)bowt(dot)ie>, Pg Hackers <pgsql-hackers(at)postgresql(dot)org>, Rafia Sabih <rafia(dot)sabih(at)enterprisedb(dot)com>, Ashutosh Bapat <ashutosh(dot)bapat(at)enterprisedb(dot)com>, Haribabu Kommi <kommi(dot)haribabu(at)gmail(dot)com>, Oleg Golovanov <rentech(at)mail(dot)ru> |
Subject: | Re: [HACKERS] Parallel Hash take II |
Date: | 2017-11-15 21:11:25 |
Message-ID: | CAEepm=0GBJVFRdsjbhtjCWuQk=QzxrTUhhySnezq6FvYTdb=1A@mail.gmail.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
On Thu, Nov 16, 2017 at 8:09 AM, Robert Haas <robertmhaas(at)gmail(dot)com> wrote:
> On Wed, Nov 15, 2017 at 1:35 PM, Andres Freund <andres(at)anarazel(dot)de> wrote:
>> But this does bug me, and I think it's what made me pause here to make a
>> bad joke. The way that parallelism treats work_mem makes it even more
>> useless of a config knob than it was before. Parallelism, especially
>> after this patch, shouldn't compete / be benchmarked against a
>> single-process run with the same work_mem. To make it "fair" you need to
>> compare parallelism against a single threaded run with work_mem *
>> max_parallelism.
>
> I don't really know how to do a fair comparison between a parallel
> plan and a non-parallel plan. Even if the parallel plan contains zero
> nodes that use work_mem, it might still use more memory than the
> non-parallel plan, because a new backend uses a bunch of memory. If
> you really want a comparison that is fair on the basis of memory
> usage, you have to take that into account somehow.
>
> But even then, the parallel plan is also almost certainly consuming
> more CPU cycles to produce the same results. Parallelism is all about
> trading away efficiency for execution time. Not just because of
> current planner and executor limitations, but intrinsically, parallel
> plans are less efficient. The globally optimal solution on a system
> that is short on either memory or CPU cycles is to turn parallelism
> off.
The guys who worked on the first attempt at Parallel Query for
Berkeley POSTGRES (and then ripped that out, moving to another project
called XPRS which I have found no trace of, perhaps it finished up in
some commercial RDBMS) wrote this[1]:
"The objective function that XPRS uses for query optimization is a
combination of resource consumption and response time as follows:
cost = resource consumption + w * response time
Here w is a system-specifc weighting factor. A small w mostly
optimizes resource consumption, while a large w mostly optimizes
response time. Resource consumption is measured by the number of disk
pages accessed and number of tuples processed, while response time is
the elapsed time for executing the query."
http://db.cs.berkeley.edu/papers/ERL-M93-28.pdf
--
Thomas Munro
http://www.enterprisedb.com
From | Date | Subject | |
---|---|---|---|
Next Message | Robert Haas | 2017-11-15 21:11:35 | Re: [HACKERS] Re: protocol version negotiation (Re: Libpq PGRES_COPY_BOTH - version compatibility) |
Previous Message | Andrew Dunstan | 2017-11-15 21:11:13 | Re: pgsql: Add hooks for session start and session end |