Quick Links

Re: CLUSTER and indisclustered

From:	Gavin Sherry <swm(at)linuxworld(dot)com(dot)au>
To:	Bruce Momjian <pgman(at)candle(dot)pha(dot)pa(dot)us>
Cc:	pgsql-hackers(at)postgresql(dot)org
Subject:	Re: CLUSTER and indisclustered
Date:	2002-08-04 03:05:39
Message-ID:	Pine.LNX.4.21.0208041300250.25079-100000@linuxworld.com.au
Views:	Whole Thread \| Raw Message \| Download mbox \| Resend email
Thread:
Lists:	pgsql-hackers

On Sat, 3 Aug 2002, Bruce Momjian wrote:

> Gavin Sherry wrote:
> > Hi all,
> >
> > It occured to me on the plane home that now that CLUSTER is fixed we may
> > be able to put pg_index.indisclustered to use. If CLUSTER was to set
> > indisclustered to true when it clusters a heap according to the given
> > index, we could speed up sequantial scans. There are two possible ways.
> >
> > 1) Planner determines that a seqscan is appropriate *and* the retrieval is
> > qualified by the key(s) of one of the relation's indexes
> > 2) Planner determines that the relation is clustered on disk according to
> > the index over the key(s) used to qualify the retrieval
> > 3) Planner sets an appropriate nodeTag for the retrieval (SeqScanCluster?)
> > 4) ExecProcNode() calls some new scan routine, ExecSeqScanCluster() ?
> > 5) ExecSeqScanCluster() calls ExecScan() with a new ExecScanAccessMtd (ie,
> > different from SeqNext) called SeqClusterNext
> > 6) SeqClusterNext() has all the heapgettup() logic with two
> > exceptions: a) we find the first tuple more intelligently (instead of
> > scanning from the first page) b) if we have found tuple(s) matching the
> > ScanKey when we encounter an non-matching tuple (via
> > HeapTupleSatisfies() ?) we return a NULL'ed out tuple, terminating the
> > scan
>
> Gavin, is that a big win compared to just using the index and looping
> through the entries, knowing that the index matches are on the same
> page, and the heap matches are on the same page.

Bruce,

It would cut out the index over head. Besides at (1) (above) we would have
determined that an index scan was too expensive and we would be using a
SeqScan instead. This would just be faster, since a) we would locate the
tuples more intelligently b) we wouldn't need to scan the whole heap once
we'd found all tuples matching the scan key.

Gavin

In response to

Re: CLUSTER and indisclustered at 2002-08-04 02:57:33 from Bruce Momjian

Responses

Re: CLUSTER and indisclustered at 2002-08-04 03:21:45 from Bruce Momjian
Re: CLUSTER and indisclustered at 2002-08-04 03:39:10 from mark Kirkwood

Browse pgsql-hackers by date

	From	Date	Subject
Next Message	Tom Lane	2002-08-04 03:20:17	Re: CLUSTER and indisclustered
Previous Message	Bruce Momjian	2002-08-04 03:03:21	Re: getpid() function