From: | Peter Geoghegan <pg(at)heroku(dot)com> |
---|---|
To: | David Rowley <david(dot)rowley(at)2ndquadrant(dot)com> |
Cc: | Andres Freund <andres(at)anarazel(dot)de>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Pg Hackers <pgsql-hackers(at)postgresql(dot)org> |
Subject: | Re: Memory prefetching while sequentially fetching from SortTuple array, tuplestore |
Date: | 2015-11-30 06:14:34 |
Message-ID: | CAM3SWZTPCnVjUGxvCr-Lyi2CNS3954QFKbWOPQeLi=r3k5Ee2A@mail.gmail.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
On Sun, Nov 29, 2015 at 8:52 PM, David Rowley
<david(dot)rowley(at)2ndquadrant(dot)com> wrote:
> You're right, gcc did not include the prefetch instructions.
> I've tested again on the same machine but with clang 3.7 instead of gcc
> 4.8.3
Thanks for going to the trouble of investigating this.
These results are mediocre -- in general, it seems like the prefetch
instructions are almost as likely to hurt as to help. Naturally, I
don't want to go down the road of considering every microarchitecture,
and that seems to be what it would take to get this to work well, if
that's possible at all.
I'm currently running some benchmarks on my external sorting patch on
the POWER7 machine that Robert Haas and a few other people have been
using for some time now [1]. So far, the benchmarks look very good
across a variety of scales.
I'll run a round of tests without the prefetching enabled (which the
patch series makes further use of -- they're also used when writing
tuples out). If there is no significant impact, I'll completely
abandon this patch, and we can move on.
[1] http://rhaas.blogspot.com/2012/03/performance-and-scalability-on-ibm.html
--
Peter Geoghegan
From | Date | Subject | |
---|---|---|---|
Next Message | KAWAMICHI Ryoji | 2015-11-30 07:29:43 | Re: Erroneous cost estimation for nested loop join |
Previous Message | David Rowley | 2015-11-30 04:52:40 | Re: Memory prefetching while sequentially fetching from SortTuple array, tuplestore |