Quick Links

Re: Merge algorithms for large numbers of "tapes"

From:	"Jonah H(dot) Harris" <jonah(dot)harris(at)gmail(dot)com>
To:	"Dann Corbit" <DCorbit(at)connx(dot)com>
Cc:	gsstark(at)mit(dot)edu, "Luke Lonergan" <llonergan(at)greenplum(dot)com>, "Tom Lane" <tgl(at)sss(dot)pgh(dot)pa(dot)us>, "Jim C(dot) Nasby" <jnasby(at)pervasive(dot)com>, "Simon Riggs" <simon(at)2ndquadrant(dot)com>, pgsql-hackers(at)postgresql(dot)org
Subject:	Re: Merge algorithms for large numbers of "tapes"
Date:	2006-03-09 01:43:39
Message-ID:	36e682920603081743h463bb808y52eac97a2e1aff92@mail.gmail.com
Views:	Whole Thread \| Raw Message \| Download mbox \| Resend email
Thread:
Lists:	pgsql-hackers

An interesting read at

http://www.vldb.org/conf/1997/P376.PDF

On 3/8/06, Dann Corbit <DCorbit(at)connx(dot)com> wrote:
>
> > -----Original Message-----
> > From: gsstark(at)mit(dot)edu [mailto:gsstark(at)mit(dot)edu]
> > Sent: Wednesday, March 08, 2006 3:56 PM
> > To: Luke Lonergan
> > Cc: Dann Corbit; Tom Lane; Jim C. Nasby; Simon Riggs; pgsql-
> > hackers(at)postgresql(dot)org
> > Subject: Re: [HACKERS] Merge algorithms for large numbers of "tapes"
> >
> >
> > "Luke Lonergan" <llonergan(at)greenplum(dot)com> writes:
> >
> > > > I am pretty sure from this thread that PostgreSQL is not doing #1,
> and
> > I
> > > > have no idea if it is doing #2.
> > >
> > > Yep. Even Knuth says that the tape goo is only interesting from a
> > > historical perspective and may not be relevant in an era of disk
> drives.
> >
> > As the size of the data grows larger the behaviour of hard drives
> looks
> > more
> > and more like tapes. The biggest factor controlling the speed of i/o
> > operations is how many seeks are required to complete them.
> Effectively
> > "rewinds" are still the problem it's just that the cost of rewinds
> becomes
> > constant regardless of how long the "tape" is.
> >
> > That's one thing that gives me pause about the current approach of
> using
> > more
> > tapes. It seems like ideally the user would create a temporary work
> space
> > on
> > each spindle and the database would arrange to use no more than that
> > number of
> > tapes. Then each merge operation would involve only sequential access
> for
> > both
> > reads and writes.
>
> If the chief concern is in the number of subfiles created, replacement
> selection doubles the length of the subfiles while consuming no more
> memory.
> {The big-O of the algorithm sucks, though}
>
> It is certainly worth testing several cases.
>
> It is not a bad idea to enable more than one method of performing an
> operation.
>
> In the ideal case, you would have specific information about drives,
> spindles, rates for seek, transfer, etc.
>
> It all depends on how much effort you want to throw at it.
>
> ---------------------------(end of broadcast)---------------------------
> TIP 3: Have you checked our extensive FAQ?
>
> http://www.postgresql.org/docs/faq
>

--
Jonah H. Harris, Database Internals Architect
EnterpriseDB Corporation
732.331.1324

In response to

Re: Merge algorithms for large numbers of "tapes" at 2006-03-09 00:18:39 from Dann Corbit

Browse pgsql-hackers by date

	From	Date	Subject
Next Message	Jim C. Nasby	2006-03-09 01:44:08	Re: Merge algorithms for large numbers of "tapes"
Previous Message	Jim C. Nasby	2006-03-09 01:31:39	Re: Merge algorithms for large numbers of "tapes"