Re: Tuplesort merge pre-reading

From: Peter Geoghegan <pg(at)bowt(dot)ie>
To: Robert Haas <robertmhaas(at)gmail(dot)com>
Cc: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Heikki Linnakangas <hlinnaka(at)iki(dot)fi>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>, Peter Geoghegan <pg(at)heroku(dot)com>
Subject: Re: Tuplesort merge pre-reading
Date: 2017-04-14 18:10:12
Message-ID: CAH2-WznAMw64GuTz0eY=skxBOw1CKPdxG3vUJ05e1FeLAkDgYg@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Fri, Apr 14, 2017 at 5:57 AM, Robert Haas <robertmhaas(at)gmail(dot)com> wrote:
> I don't think there's any one fixed answer, because increasing the
> number of tapes reduces I/O by adding CPU cost, and visca versa.

Sort of, but if you have to merge hundreds of runs (a situation that
should be quite rare), then you should be concerned about being CPU
bound first, as Knuth was. Besides, on modern hardware, read-ahead can
be more effective if you have more merge passes, to a point, which
might also make it worth it -- using hundreds of tapes results in
plenty of *random* I/O. Plus, most of the time you only do a second
pass over a subset of initial quicksorted runs -- not all of them.

Probably the main complicating factor that Knuth doesn't care about is
time to return the first tuple -- startup cost. That was a big
advantage for commit df700e6 that I should have mentioned.

I'm not seriously suggesting that we should prefer multiple passes in
the vast majority of real world cases, nor am I suggesting that we
should go out of our way to help cases that need to do that. I just
find all this interesting.

--
Peter Geoghegan

VMware vCenter Server
https://www.vmware.com/

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Petr Jelinek 2017-04-14 18:23:17 Re: logical replication and PANIC during shutdown checkpoint in publisher
Previous Message Peter Eisentraut 2017-04-14 18:02:46 Re: logical replication apply to run with sync commit off by default