From: | "Dann Corbit" <DCorbit(at)connx(dot)com> |
---|---|
To: | <gsstark(at)mit(dot)edu>, "Luke Lonergan" <llonergan(at)greenplum(dot)com> |
Cc: | "Tom Lane" <tgl(at)sss(dot)pgh(dot)pa(dot)us>, "Jim C(dot) Nasby" <jnasby(at)pervasive(dot)com>, "Simon Riggs" <simon(at)2ndquadrant(dot)com>, <pgsql-hackers(at)postgresql(dot)org> |
Subject: | Re: Merge algorithms for large numbers of "tapes" |
Date: | 2006-03-09 00:18:39 |
Message-ID: | D425483C2C5C9F49B5B7A41F8944154757D5EC@postal.corporate.connx.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
> -----Original Message-----
> From: gsstark(at)mit(dot)edu [mailto:gsstark(at)mit(dot)edu]
> Sent: Wednesday, March 08, 2006 3:56 PM
> To: Luke Lonergan
> Cc: Dann Corbit; Tom Lane; Jim C. Nasby; Simon Riggs; pgsql-
> hackers(at)postgresql(dot)org
> Subject: Re: [HACKERS] Merge algorithms for large numbers of "tapes"
>
>
> "Luke Lonergan" <llonergan(at)greenplum(dot)com> writes:
>
> > > I am pretty sure from this thread that PostgreSQL is not doing #1,
and
> I
> > > have no idea if it is doing #2.
> >
> > Yep. Even Knuth says that the tape goo is only interesting from a
> > historical perspective and may not be relevant in an era of disk
drives.
>
> As the size of the data grows larger the behaviour of hard drives
looks
> more
> and more like tapes. The biggest factor controlling the speed of i/o
> operations is how many seeks are required to complete them.
Effectively
> "rewinds" are still the problem it's just that the cost of rewinds
becomes
> constant regardless of how long the "tape" is.
>
> That's one thing that gives me pause about the current approach of
using
> more
> tapes. It seems like ideally the user would create a temporary work
space
> on
> each spindle and the database would arrange to use no more than that
> number of
> tapes. Then each merge operation would involve only sequential access
for
> both
> reads and writes.
If the chief concern is in the number of subfiles created, replacement
selection doubles the length of the subfiles while consuming no more
memory.
{The big-O of the algorithm sucks, though}
It is certainly worth testing several cases.
It is not a bad idea to enable more than one method of performing an
operation.
In the ideal case, you would have specific information about drives,
spindles, rates for seek, transfer, etc.
It all depends on how much effort you want to throw at it.
From | Date | Subject | |
---|---|---|---|
Next Message | Jim C. Nasby | 2006-03-09 01:31:39 | Re: Merge algorithms for large numbers of "tapes" |
Previous Message | Greg Stark | 2006-03-08 23:55:59 | Re: Merge algorithms for large numbers of "tapes" |