From: | Peter Geoghegan <pg(at)bowt(dot)ie> |
---|---|
To: | Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> |
Cc: | Heikki Linnakangas <hlinnaka(at)iki(dot)fi>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>, Peter Geoghegan <pg(at)heroku(dot)com> |
Subject: | Re: Tuplesort merge pre-reading |
Date: | 2017-04-14 05:19:58 |
Message-ID: | CAH2-WznrO1XQ5F3Mb+mWyrE_aY5DJWOFh=ePbw1BVi1=JoG9sQ@mail.gmail.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
On Thu, Apr 13, 2017 at 9:51 PM, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
> I'm fairly sure that the point was exactly what it said, ie improve
> locality of access within the temp file by sequentially reading as many
> tuples in a row as we could, rather than grabbing one here and one there.
>
> It may be that the work you and Peter G. have been doing have rendered
> that question moot. But I'm a bit worried that the reason you're not
> seeing any effect is that you're only testing situations with zero seek
> penalty (ie your laptop's disk is an SSD). Back then I would certainly
> have been testing with temp files on spinning rust, and I fear that this
> may still be an issue in that sort of environment.
I actually think Heikki's work here would particularly help on
spinning rust, especially when less memory is available. He
specifically justified it on the basis of it resulting in a more
sequential read pattern, particularly when multiple passes are
required.
> The larger picture to be drawn from that thread is that we were seeing
> very different performance characteristics on different platforms.
> The specific issue that Tatsuo-san reported seemed like it might be
> down to weird read-ahead behavior in a 90s-vintage Linux kernel ...
> but the point that this stuff can be environment-dependent is still
> something to take to heart.
BTW, I'm skeptical of the idea of Heikki's around killing polyphase
merge itself at this point. I think that keeping most tapes active per
pass is useful now that our memory accounting involves handing over an
even share to each maybe-active tape for every merge pass, something
established by Heikki's work on external sorting.
Interestingly enough, I think that Knuth was pretty much spot on with
his "sweet spot" of 7 tapes, even if you have modern hardware. Commit
df700e6 (where the sweet spot of merge order 7 was no longer always
used) was effective because it masked certain overheads that we
experience when doing multiple passes, overheads that Heikki and I
mostly removed. This was confirmed by Robert's testing of my merge
order cap work for commit fc19c18, where he found that using 7 tapes
was only slightly worse than using many hundreds of tapes. If we could
somehow be completely effective in making access to logical tapes
perfectly sequential, then 7 tapes would probably be noticeably
*faster*, due to CPU caching effects.
Knuth was completely correct to say that it basically made no
difference once more than 7 tapes are used to merge, because he didn't
have logtape.c fragmentation to worry about.
--
Peter Geoghegan
VMware vCenter Server
https://www.vmware.com/
From | Date | Subject | |
---|---|---|---|
Next Message | Noah Misch | 2017-04-14 05:49:29 | Re: [pgsql-www] Small issue in online devel documentation build |
Previous Message | Amit Langote | 2017-04-14 05:10:12 | Re: pg_dump emits ALTER TABLE ONLY partitioned_table |