From: | Simon Riggs <simon(at)2ndquadrant(dot)com> |
---|---|
To: | pgsql-hackers(at)postgresql(dot)org |
Subject: | Implied Functional Index use |
Date: | 2006-07-11 20:14:03 |
Message-ID: | 1152648843.2465.73.camel@localhost.localdomain |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
Currently, functional indexes can be used by a query that explicitly
mentions the exact phrasing of the functional index within the WHERE
clause.
IMHO it is feasible to extend the range of WHERE clauses for which
functional indexes can be used by using implication, much in the same
way that we use transitive closure in join queries now. This has some
powerful and important uses, just one of which is index prefix
compression (so read on).
If we have a subclause within a WHERE clause of the form
col1 = k1 AND col2 = k2 AND ... colN = kN
this implies that the following clause is also true:
col1 = k1 AND col2 = k2 AND ... colN = kN
AND F(col1, col2 ... colN) = F(k1, k2 ... kN)
iff:
- the function F is IMMUTABLE, since that definition implies that the
output is always the same for any set of constant inputs.
- the operator that connects each col1,k1 pair must be defined as true
equality (note that in the above example what looks like the same
equality operator is in fact context dependent on the datatypes)
- the datatypes of each col1, k1 pair must match
If we have a Functional Index F(col1, col2...colN) then we can use the
second implied form of the subclause to recognise that the index can
be useful for the query. This would then allow a query plan that uses an
IndexScan on the functional index with a re-check filter of the original
query clause.
An example might be a query using the clause
surname = 'RIGGS'
would allow us to use an index defined as
metaphone(surname)
even though the original application was written before we added the
index. Note that the index would be significantly smaller than an index
on the full surname, as well as avoiding some of the hotspots caused by
certain most-frequent-values in the data.
An alternative example might be to define an index on
substr(text_col, 1, 100)
which still allows searching for longer strings without the need to
store the complete string in the index. This is effectively index prefix
compression, which is not directly possible with pgsql because of the
requirements of datatype encapsulation. (Note that this avoids the need
to have very long index keys also).
One difficulty to this is defining which operators represent
true-equality. There isn't a definition of this currently for operators
and we cannot assume that an operator called "=" has the required
properties, even if that is true for most built-in types. An example of
true-equality and its difficulties is for the FLOAT type minus-zero (-0)
and plus-zero (+0) have different byte representations yet when compared
with the standard FLOAT comparison operator +0 and -0 would be
considered equal. If we were to put each value through a hash-like
function such as md5() then we would clearly get different answers.
To take advantage of this, I propose to
- add code to be called from allpaths.c: when we check functional
indexes, if they exist and yet are not usable because of a lack of
explicit clauses we will check to see if there any clauses that can be
used to imply the correct use of the functional index. This is in many
ways similar to existing constraint exclusion code.
- add a new boolean to pg_operator to allow us to define which operators
offer true equality
- add a new keyword EQUALITY to CREATE/ALTER OPERATOR (which I think
implies HASHES and MERGES if they are not mentioned explicitly).
No promises for 8.2, but does anyone have further input?
--
Simon Riggs
EnterpriseDB http://www.enterprisedb.com
From | Date | Subject | |
---|---|---|---|
Next Message | Jan Wieck | 2006-07-11 20:18:20 | Re: lastval exposes information that currval does not |
Previous Message | Greg Stark | 2006-07-11 20:04:10 | Re: pgsql-patches considered harmful |