Re: Merge algorithms for large numbers of "tapes"

From: "Luke Lonergan" <llonergan(at)greenplum(dot)com>
To: "Tom Lane" <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: "Simon Riggs" <simon(at)2ndquadrant(dot)com>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: Merge algorithms for large numbers of "tapes"
Date: 2006-03-08 06:00:35
Message-ID: C033B303.1EBBD%llonergan@greenplum.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Tom,

On 3/7/06 8:03 PM, "Tom Lane" <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:

> "Luke Lonergan" <llonergan(at)greenplum(dot)com> writes:
>> Two passes is the state-of-the-practice on external disk sorts.
>
> There is no such thing as a fixed number of passes regardless of
> available memory and size of the data.

While technically correct, the practice shows that two-passes is the norm
for the vast majority of cases since 1986:
http://research.microsoft.com/~Gray/papers/TandemTR86.3_FastSort.doc

Square root of the sort set is the memory requirement for a two pass sort.
10M for 10GB of sort set, for instance.

Given the current resource management limitations we live with, you are
correct about multi-pass being necessary, but we should be aware that modern
commercial databases allocate enough memory to provide two-pass external
sorts.

- Luke

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Gourish Singbal 2006-03-08 07:54:32 Re: current segfault if autovauum is on
Previous Message Josh Berkus 2006-03-08 05:29:31 Re: PostgreSQL Anniversary Summit, Call for Contributions