From: | Gavin Sherry <swm(at)linuxworld(dot)com(dot)au> |
---|---|
To: | pgsql-hackers(at)postgresql(dot)org |
Subject: | CLUSTER and indisclustered |
Date: | 2002-08-04 01:43:31 |
Message-ID: | Pine.LNX.4.21.0208041114560.21891-100000@linuxworld.com.au |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
Hi all,
It occured to me on the plane home that now that CLUSTER is fixed we may
be able to put pg_index.indisclustered to use. If CLUSTER was to set
indisclustered to true when it clusters a heap according to the given
index, we could speed up sequantial scans. There are two possible ways.
1) Planner determines that a seqscan is appropriate *and* the retrieval is
qualified by the key(s) of one of the relation's indexes
2) Planner determines that the relation is clustered on disk according to
the index over the key(s) used to qualify the retrieval
3) Planner sets an appropriate nodeTag for the retrieval (SeqScanCluster?)
4) ExecProcNode() calls some new scan routine, ExecSeqScanCluster() ?
5) ExecSeqScanCluster() calls ExecScan() with a new ExecScanAccessMtd (ie,
different from SeqNext) called SeqClusterNext
6) SeqClusterNext() has all the heapgettup() logic with two
exceptions: a) we find the first tuple more intelligently (instead of
scanning from the first page) b) if we have found tuple(s) matching the
ScanKey when we encounter an non-matching tuple (via
HeapTupleSatisfies() ?) we return a NULL'ed out tuple, terminating the
scan
Any reason this isn't possible? Any reason it couldn't dramatically speed
up the performance of the type of query i've mentioned?
Gavin
From | Date | Subject | |
---|---|---|---|
Next Message | Tom Lane | 2002-08-04 02:01:12 | Re: PITR, checkpoint, and local relations |
Previous Message | Tom Lane | 2002-08-04 01:38:05 | Re: FUNC_MAX_ARGS benchmarks |