Quick Links

Re: [PERFORM] A Better External Sort?

From:	"Zeugswetter Andreas DAZ SD" <ZeugswetterA(at)spardat(dot)at>
To:	"Ron Peacetree" <rjpeace(at)earthlink(dot)net>, "Jeffrey W(dot) Baker" <jwbaker(at)acm(dot)org>, <pgsql-hackers(at)postgresql(dot)org>, <pgsql-performance(at)postgresql(dot)org>
Subject:	Re: [PERFORM] A Better External Sort?
Date:	2005-09-29 13:28:27
Message-ID:	E1539E0ED7043848906A8FF995BDA5797EFA40@m0143.s-mxs.net
Views:	Whole Thread \| Raw Message \| Download mbox \| Resend email
Thread:
Lists:	pgsql-hackers pgsql-performance

> In my original example, a sequential scan of the 1TB of 2KB
> or 4KB records, => 250M or 500M records of data, being sorted
> on a binary value key will take ~1000x more time than reading
> in the ~1GB Btree I described that used a Key+RID (plus node
> pointers) representation of the data.

Imho you seem to ignore the final step your algorithm needs of
collecting the
data rows. After you sorted the keys the collect step will effectively
access the
tuples in random order (given a sufficiently large key range).

This random access is bad. It effectively allows a competing algorithm
to read the
whole data at least 40 times sequentially, or write the set 20 times
sequentially.
(Those are the random/sequential ratios of modern discs)

Andreas

Responses

Re: [HACKERS] A Better External Sort? at 2005-10-08 22:51:55 from Jim C. Nasby

Browse pgsql-hackers by date

	From	Date	Subject
Next Message	Christopher Kings-Lynne	2005-09-29 13:28:38	Re: Query in SQL statement
Previous Message	Fabien COELHO	2005-09-29 13:24:15	Re: Install pg_regress script to support PGXS?

Browse pgsql-performance by date

	From	Date	Subject
Next Message	Christopher Kings-Lynne	2005-09-29 13:28:38	Re: Query in SQL statement
Previous Message	Andreas Pflug	2005-09-29 13:22:03	Re: Comparative performance