From: | Peter Geoghegan <pg(at)heroku(dot)com> |
---|---|
To: | David Rowley <david(dot)rowley(at)2ndquadrant(dot)com> |
Cc: | Andres Freund <andres(at)anarazel(dot)de>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Pg Hackers <pgsql-hackers(at)postgresql(dot)org> |
Subject: | Re: Memory prefetching while sequentially fetching from SortTuple array, tuplestore |
Date: | 2015-11-30 21:04:43 |
Message-ID: | CAM3SWZR5rv3+F3FOKf35=dti7oTmmcdFoe2voGuR0pddg3Jb+Q@mail.gmail.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
On Sun, Nov 29, 2015 at 10:14 PM, Peter Geoghegan <pg(at)heroku(dot)com> wrote:
> I'm currently running some benchmarks on my external sorting patch on
> the POWER7 machine that Robert Haas and a few other people have been
> using for some time now [1]. So far, the benchmarks look very good
> across a variety of scales.
>
> I'll run a round of tests without the prefetching enabled (which the
> patch series makes further use of -- they're also used when writing
> tuples out). If there is no significant impact, I'll completely
> abandon this patch, and we can move on.
I took a look at this. It turns out prefetching significantly helps on
the POWER7 system, when sorting gensort tables of 50 million, 100
million, 250 million, and 500 million tuples (3 CREATE INDEX tests for
each case, 1GB maintenance_work_mem):
[pg(at)hydra gensort]$ cat test_output_patch_1gb.txt | grep "sort ended"
LOG: external sort ended, 171063 disk blocks used: CPU 4.33s/71.28u
sec elapsed 75.75 sec
LOG: external sort ended, 171063 disk blocks used: CPU 4.30s/71.32u
sec elapsed 75.91 sec
LOG: external sort ended, 171063 disk blocks used: CPU 4.29s/71.34u
sec elapsed 75.69 sec
LOG: external sort ended, 342124 disk blocks used: CPU 8.10s/165.56u
sec elapsed 174.35 sec
LOG: external sort ended, 342124 disk blocks used: CPU 8.07s/165.15u
sec elapsed 173.70 sec
LOG: external sort ended, 342124 disk blocks used: CPU 8.01s/164.73u
sec elapsed 174.84 sec
LOG: external sort ended, 855306 disk blocks used: CPU 23.65s/491.37u
sec elapsed 522.44 sec
LOG: external sort ended, 855306 disk blocks used: CPU 21.13s/508.02u
sec elapsed 530.48 sec
LOG: external sort ended, 855306 disk blocks used: CPU 22.63s/475.33u
sec elapsed 499.09 sec
LOG: external sort ended, 1710613 disk blocks used: CPU
47.99s/1016.78u sec elapsed 1074.55 sec
LOG: external sort ended, 1710613 disk blocks used: CPU
46.52s/1015.25u sec elapsed 1078.23 sec
LOG: external sort ended, 1710613 disk blocks used: CPU
44.34s/1013.26u sec elapsed 1067.16 sec
[pg(at)hydra gensort]$ cat test_output_patch_noprefetch_1gb.txt | grep "sort ended"
LOG: external sort ended, 171063 disk blocks used: CPU 4.79s/78.14u
sec elapsed 83.03 sec
LOG: external sort ended, 171063 disk blocks used: CPU 3.85s/77.71u
sec elapsed 81.64 sec
LOG: external sort ended, 171063 disk blocks used: CPU 3.94s/77.71u
sec elapsed 81.71 sec
LOG: external sort ended, 342124 disk blocks used: CPU 8.88s/180.15u
sec elapsed 189.69 sec
LOG: external sort ended, 342124 disk blocks used: CPU 8.30s/179.07u
sec elapsed 187.92 sec
LOG: external sort ended, 342124 disk blocks used: CPU 8.29s/179.06u
sec elapsed 188.02 sec
LOG: external sort ended, 855306 disk blocks used: CPU 22.16s/516.86u
sec elapsed 541.35 sec
LOG: external sort ended, 855306 disk blocks used: CPU 21.66s/513.59u
sec elapsed 538.00 sec
LOG: external sort ended, 855306 disk blocks used: CPU 22.56s/499.63u
sec elapsed 525.53 sec
LOG: external sort ended, 1710613 disk blocks used: CPU
45.00s/1062.26u sec elapsed 1118.52 sec
LOG: external sort ended, 1710613 disk blocks used: CPU
44.42s/1061.33u sec elapsed 1117.27 sec
LOG: external sort ended, 1710613 disk blocks used: CPU
44.47s/1064.93u sec elapsed 1118.79 sec
For example, the 50 million tuple test has over 8% of its runtime
shaved off. This seems to be a consistent pattern.
Note that only the writing of tuples uses prefetching here, because
that happens to be the only affected codepath for prefetching (note
also that this is the slightly different, external-specific version of
the patch). I hesitate to give that up, although it is noticeable that
it matters less at higher scales, where we're bottlenecked on
quicksorting itself, more so than writing. Those costs grow at
different rates, of course.
Perhaps we can consider more selectively applying prefetching in the
context of writing out tuples. After all, the amount of useful work
that we can do pending fetching from memory ought to be more
consistent and manageable, which could make it a consistent win. I
will need to think about this some more.
--
Peter Geoghegan
From | Date | Subject | |
---|---|---|---|
Next Message | YUriy Zhuravlev | 2015-11-30 21:05:03 | Re: Some questions about the array. |
Previous Message | Peter Geoghegan | 2015-11-30 20:29:51 | Re: Using quicksort for every external sort run |