Re: [PERFORM] A Better External Sort?

From: Ron Peacetree <rjpeace(at)earthlink(dot)net>
To: "Jeffrey W(dot) Baker" <jwbaker(at)acm(dot)org>, pgsql-hackers(at)postgresql(dot)org, pgsql-performance(at)postgresql(dot)org
Subject: Re: [PERFORM] A Better External Sort?
Date: 2005-09-29 06:21:10
Message-ID: 11604324.1127974870565.JavaMail.root@elwamui-polski.atl.sa.earthlink.net
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers pgsql-performance

>From: "Jeffrey W. Baker" <jwbaker(at)acm(dot)org>
>Sent: Sep 29, 2005 12:27 AM
>To: Ron Peacetree <rjpeace(at)earthlink(dot)net>
>Cc: pgsql-hackers(at)postgresql(dot)org, pgsql-performance(at)postgresql(dot)org
>Subject: Re: [HACKERS] [PERFORM] A Better External Sort?
>
>You are engaging in a length and verbose exercise in mental
>masturbation, because you have not yet given a concrete example of a
>query where this stuff would come in handy. A common, general-purpose
>case would be the best.
>
??? I posted =3= specific classes of common, general-purpose query
operations where OES and the OES Btrees look like they should be
superior to current methods:
1= when splitting sorting or other operations across multiple CPUs
2= when doing joins of different tables by doing the join on these Btrees
rather than the original tables.
3= when the opportunity arises to reuse OES Btree results of previous
sorts for different keys in the same table. Now we can combine the
existing Btrees to obtain the new order based on the composite key
without ever manipulating the original, much larger, table.

In what way are these examples not "concrete"?

>We can all see that the method you describe might be a good way to sort
>a very large dataset with some known properties, which would be fine if
>you are trying to break the terasort benchmark. But that's not what
>we're doing here. We are designing and operating relational databases.
>So please explain the application.
>
This is a GENERAL method. It's based on CPU cache efficient Btrees that
use variable length prefix keys and RIDs.
It assumes NOTHING about the data or the system in order to work.
I gave some concrete examples for the sake of easing explanation, NOT
as an indication of assumptions or limitations of the method. I've even
gone out of my way to prove that no such assumptions or limitations exist.
Where in the world are you getting such impressions?

>Your main example seems to focus on a large table where a key column has
>constrained values. This case is interesting in proportion to the
>number of possible values. If I have billions of rows, each having one
>of only two values, I can think of a trivial and very fast method of
>returning the table "sorted" by that key: make two sequential passes,
>returning the first value on the first pass and the second value on the
>second pass. This will be faster than the method you propose.
>
1= No that was not my main example. It was the simplest example used to
frame the later more complicated examples. Please don't get hung up on it.

2= You are incorrect. Since IO is the most expensive operation we can do,
any method that makes two passes through the data at top scanning speed
will take at least 2x as long as any method that only takes one such pass.

>I think an important aspect you have failed to address is how much of
>the heap you must visit after the sort is complete. If you are
>returning every tuple in the heap then the optimal plan will be very
>different from the case when you needn't.
>
Hmmm. Not sure which "heap" you are referring to, but the OES Btree
index is provably the lowest (in terms of tree height) and smallest
possible CPU cache efficient data structure that one can make and still
have all of the traditional benefits associated with a Btree representation
of a data set.

Nonetheless, returning a RID, or all RIDs with(out) the same Key, or all
RIDs (not) within a range of Keys, or simply all RIDs in sorted order is
efficient. Just as should be for a Btree (actually it's a B+ tree variant to
use Knuth's nomenclature). I'm sure someone posting from acm.org
recognizes how each of these Btree operations maps to various SQL
features...

I haven't been talking about query plans because they are orthogonal to
the issue under discussion? If we use a layered model for PostgreSQL's
architecture, this functionality is more primal than that of a query
planner. ALL query plans that currently involve sorts will benefit from a
more efficient way to do, or avoid, sorts.

>PS: Whatever mailer you use doesn't understand or respect threading nor
>attribution. Out of respect for the list's readers, please try a mailer
>that supports these 30-year-old fundamentals of electronic mail.
>
That is an issue of infrastructure on the recieving side, not on the sending
(my) side since even my web mailer seems appropriately RFC conformant.
Everything seems to be going in the correct places and being properly
organized on archival.postgres.org ...

Ron

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Peter Eisentraut 2005-09-29 06:36:22 Re: [PATCHES] Proposed patch for sequence-renaming problems
Previous Message Thomas Hallgren 2005-09-29 06:17:15 Re: Server process exited with unexpected status 128.

Browse pgsql-performance by date

  From Date Subject
Next Message Magnus Hagander 2005-09-29 06:29:09 Re: Comparative performance
Previous Message Dennis Bjorklund 2005-09-29 05:38:38 Re: Comparative performance