New xindex.sgml

From: "D'Arcy" "J(dot)M(dot)" Cain <darcy(at)druid(dot)net>
To: hackers(at)PostgreSQL(dot)org
Subject: New xindex.sgml
Date: 1999-05-27 13:58:14
Message-ID: m10n0fm-0000bIC@druid.net
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

I have made a number of changes to xindex.sgml. I reformatted it to make
the source a little easier to read. I also added the bits that I recently
discovered making my own user-defined type work with indeces in where
clauses. Can a few people pick at this and see if I am correct then
put it into the tree if it is OK?

And thanks for the help getting my type working.

<Chapter Id="xindex">
<Title>Interfacing Extensions To Indices</Title>

<Para>
The procedures described thus far let you define a new type, new
functions and new operators. However, we cannot yet define a secondary
index (such as a <Acronym>B-tree</Acronym>, <Acronym>R-tree</Acronym> or
hash access method) over a new type or its operators.
</Para>

<Para>
Look back at
<XRef LinkEnd="EXTEND-CATALOGS" EndTerm="EXTEND-CATALOGS">.
The right half shows the catalogs that we must modify in order to tell
<ProductName>Postgres</ProductName> how to use a user-defined type and/or
user-defined operators with an index (i.e., <FileName>pg_am, pg_amop,
pg_amproc, pg_operator</FileName> and <FileName>pg_opclass</FileName>).
Unfortunately, there is no simple command to do this. We will demonstrate
how to modify these catalogs through a running example: a new operator
class for the <Acronym>B-tree</Acronym> access method that stores and
sorts complex numbers in ascending absolute value order.
</Para>

<Para>
The <FileName>pg_am</FileName> class contains one instance for every user
defined access method. Support for the heap access method is built into
<ProductName>Postgres</ProductName>, but every other access method is
described here. The schema is

<TABLE TOCENTRY="1">
<Title>Index Schema</Title>
<TitleAbbrev>Indices</TitleAbbrev>
<TGroup Cols="2">
<THead>
<Row>
<Entry>Attribute</Entry>
<Entry>Description</Entry>
</Row>
</THead>
<TBody>
<Row>
<Entry>amname</Entry>
<Entry>name of the access method</Entry>
</Row>
<Row>
<Entry>amowner</Entry>
<Entry>object id of the owner's instance in pg_user</Entry>
</Row>
<Row>
<Entry>amkind</Entry>
<Entry>not used at present, but set to 'o' as a place holder</Entry>
</Row>
<Row>
<Entry>amstrategies</Entry>
<Entry>number of strategies for this access method (see below)</Entry>
</Row>
<Row>
<Entry>amsupport</Entry>
<Entry>number of support routines for this access method (see below)</Entry>
</Row>
<Row>
<Entry>amgettuple
aminsert
...</Entry>

<Entry>procedure identifiers for interface routines to the access
method. For example, regproc ids for opening, closing, and
getting instances from the access method appear here. </Entry>
</Row>
</TBody>
</TGroup>
</TABLE>
</Para>

<Para>
The <Acronym>object ID</Acronym> of the instance in
<FileName>pg_am</FileName> is used as a foreign key in lots of other
classes. You don't need to add a new instance to this class; all
you're interested in is the <Acronym>object ID</Acronym> of the access
method instance you want to extend:

<ProgramListing>
SELECT oid FROM pg_am WHERE amname = 'btree';

+----+
|oid |
+----+
|403 |
+----+
</ProgramListing>
</Para>

<Para>
We will use that select in a where clause later.
</Para>

<Para>
The <FileName>amstrategies</FileName> attribute exists to standardize
comparisons across data types. For example, <Acronym>B-tree</Acronym>s
impose a strict ordering on keys, lesser to greater. Since
<ProductName>Postgres</ProductName> allows the user to define operators,
<ProductName>Postgres</ProductName> cannot look at the name of an operator
(eg, ">" or "<") and tell what kind of comparison it is. In fact,
some access methods don't impose any ordering at all. For example,
<Acronym>R-tree</Acronym>s express a rectangle-containment relationship,
whereas a hashed data structure expresses only bitwise similarity based
on the value of a hash function. <ProductName>Postgres</ProductName>
needs some consistent way of taking a qualification in your query,
looking at the operator and then deciding if a usable index exists. This
implies that <ProductName>Postgres</ProductName> needs to know, for
example, that the "<=" and ">" operators partition a
<Acronym>B-tree</Acronym>. <ProductName>Postgres</ProductName>
uses strategies to express these relationships between
operators and the way they can be used to scan indices.
</Para>

<Para>
Defining a new set of strategies is beyond the scope of this discussion,
but we'll explain how <Acronym>B-tree</Acronym> strategies work because
you'll need to know that to add a new operator class. In the
<FileName>pg_am</FileName> class, the amstrategies attribute is the
number of strategies defined for this access method. For
<Acronym>B-tree</Acronym>s, this number is 5. These strategies
correspond to

<TABLE TOCENTRY="1">
<Title>B-tree Strategies</Title>
<TitleAbbrev>B-tree</TitleAbbrev>
<TGroup Cols="2">
<THead>
<Row>
<Entry>Operation</Entry>
<Entry>Index</Entry>
</Row>
</THead>
<TBody>
<Row>
<Entry>less than</Entry>
<Entry>1</Entry>
</Row>
<Row>
<Entry>less than or equal</Entry>
<Entry>2</Entry>
</Row>
<Row>
<Entry>equal</Entry>
<Entry>3</Entry>
</Row>
<Row>
<Entry>greater than or equal</Entry>
<Entry>4</Entry>
</Row>
<Row>
<Entry>greater than</Entry>
<Entry>5</Entry>
</Row>
</TBody>
</TGroup>
</TABLE>
</Para>

<Para>
The idea is that you'll need to add procedures corresponding to the
comparisons above to the <FileName>pg_amop</FileName> relation (see below).
The access method code can use these strategy numbers, regardless of data
type, to figure out how to partition the <Acronym>B-tree</Acronym>,
compute selectivity, and so on. Don't worry about the details of adding
procedures yet; just understand that there must be a set of these
procedures for <FileName>int2, int4, oid,</FileName> and every other
data type on which a <Acronym>B-tree</Acronym> can operate.

Sometimes, strategies aren't enough information for the system to figure
out how to use an index. Some access methods require other support
routines in order to work. For example, the <Acronym>B-tree</Acronym>
access method must be able to compare two keys and determine whether one
is greater than, equal to, or less than the other. Similarly, the
<Acronym>R-tree</Acronym> access method must be able to compute
intersections, unions, and sizes of rectangles. These
operations do not correspond to user qualifications in
SQL queries; they are administrative routines used by
the access methods, internally.
</Para>

<Para>
In order to manage diverse support routines consistently across all
<ProductName>Postgres</ProductName> access methods,
<FileName>pg_am</FileName> includes an attribute called
<FileName>amsupport</FileName>. This attribute records the number of
support routines used by an access method. For <Acronym>B-tree</Acronym>s,
this number is one -- the routine to take two keys and return -1, 0, or
+1, depending on whether the first key is less than, equal
to, or greater than the second.
<Note>
<Para>
Strictly speaking, this routine can return a negative
number (< 0), 0, or a non-zero positive number (> 0).
</Para>
</Note>
</para>
<Para>
The <FileName>amstrategies</FileName> entry in pg_am is just the number
of strategies defined for the access method in question. The procedures
for less than, less equal, and so on don't appear in
<FileName>pg_am</FileName>. Similarly, <FileName>amsupport</FileName>
is just the number of support routines required by the access
method. The actual routines are listed elsewhere.
</Para>

<Para>
The next class of interest is pg_opclass. This class exists only to
associate a name and default type with an oid. In pg_amop, every
<Acronym>B-tree</Acronym> operator class has a set of procedures, one
through five, above. Some existing opclasses are <FileName>int2_ops,
int4_ops, and oid_ops</FileName>. You need to add an instance with your
opclass name (for example, <FileName>complex_abs_ops</FileName>) to
<FileName>pg_opclass</FileName>. The <FileName>oid</FileName> of
this instance is a foreign key in other classes.

<ProgramListing>
INSERT INTO pg_opclass (opcname, opcdeftype)
SELECT 'complex_abs_ops', oid FROM pg_type WHERE typname = 'complex_abs';

SELECT oid, opcname, opcdeftype
FROM pg_opclass
WHERE opcname = 'complex_abs_ops';

+------+-----------------+------------+
|oid | opcname | opcdeftype |
+------+-----------------+------------+
|17314 | complex_abs_ops | 29058 |
+------+-----------------+------------+
</ProgramListing>

Note that the oid for your <FileName>pg_opclass</FileName> instance will
be different! Don't worry about this though. We'll get this number
from the system later just like we got the oid of the type here.
</Para>

<Para>
So now we have an access method and an operator class.
We still need a set of operators; the procedure for
defining operators was discussed earlier in this manual.
For the complex_abs_ops operator class on Btrees,
the operators we require are:

<ProgramListing>
absolute value less-than
absolute value less-than-or-equal
absolute value equal
absolute value greater-than-or-equal
absolute value greater-than
</ProgramListing>
</Para>

<Para>
Suppose the code that implements the functions defined
is stored in the file
<FileName>PGROOT/src/tutorial/complex.c</FileName>
</Para>

<Para>
Part of the code look like this: (note that we will only show the
equality operator for the rest of the examples. The other four
operators are very similar. Refer to <FileName>complex.c</FileName>
or <FileName>complex.source</FileName> for the details.)

<ProgramListing>
#define Mag(c) ((c)-&gt;x*(c)-&gt;x + (c)-&gt;y*(c)-&gt;y)

bool
complex_abs_eq(Complex *a, Complex *b)
{
double amag = Mag(a), bmag = Mag(b);
return (amag==bmag);
}
</ProgramListing>
</Para>

<Para>
There are a couple of important things that are happening below.
</Para>

<Para>
First, note that operators for less-than, less-than-or equal, equal,
greater-than-or-equal, and greater-than for <FileName>int4</FileName>
are being defined. All of these operators are already defined for
<FileName>int4</FileName> under the names &lt;, &lt;=, =, &gt;=,
and &gt;. The new operators behave differently, of course. In order
to guarantee that <ProductName>Postgres</ProductName> uses these
new operators rather than the old ones, they need to be named differently
from the old ones. This is a key point: you can overload operators in
<ProductName>Postgres</ProductName>, but only if the operator isn't
already defined for the argument types. That is, if you have &lt;
defined for (int4, int4), you can't define it again.
<ProductName>Postgres</ProductName> does not check this when you define
your operator, so be careful. To avoid this problem, odd names will be
used for the operators. If you get this wrong, the access methods
are likely to crash when you try to do scans.
</Para>

<Para>
The other important point is that all the operator functions return
Boolean values. The access methods rely on this fact. (On the other
hand, the support function returns whatever the particular access method
expects -- in this case, a signed integer.) The final routine in the
file is the "support routine" mentioned when we discussed the amsupport
attribute of the <FileName>pg_am</FileName> class. We will use this
later on. For now, ignore it.
</Para>

<Para>
<ProgramListing>
CREATE FUNCTION complex_abs_eq(complex_abs, complex_abs)
RETURNS bool
AS 'PGROOT/tutorial/obj/complex.so'
LANGUAGE 'c';
</ProgramListing>
</Para>

<Para>
Now define the operators that use them. As noted, the operator names
must be unique among all operators that take two <FileName>int4</FileName>
operands. In order to see if the operator names listed below are taken,
we can do a query on <FileName>pg_operator</FileName>:

<ProgramListing>
/*
* this query uses the regular expression operator (~)
* to find three-character operator names that end in
* the character &amp;
*/
SELECT *
FROM pg_operator
WHERE oprname ~ '^..&amp;$'::text;
</ProgramListing>

</Para>

<Para>
to see if your name is taken for the types you want. The important
things here are the procedure (which are the <Acronym>C</Acronym>
functions defined above) and the restriction and join selectivity
functions. You should just use the ones used below--note that there
are different such functions for the less-than, equal, and greater-than
cases. These must be supplied, or the access method will crash when it
tries to use the operator. You should copy the names for restrict and
join, but use the procedure names you defined in the last step.

<ProgramListing>
CREATE OPERATOR = (
leftarg = complex_abs, rightarg = complex_abs,
procedure = complex_abs_eq,
restrict = eqsel, join = eqjoinsel
)
</ProgramListing>
</Para>

<Para>
Notice that five operators corresponding to less, less equal, equal,
greater, and greater equal are defined.
</Para>

<Para>
We're just about finished. the last thing we need to do is to update
the <FileName>pg_amop</FileName> relation. To do this, we need the
following attributes:

<TABLE TOCENTRY="1">
<Title><FileName>pg_amproc</FileName> Schema</Title>
<TitleAbbrev><FileName>pg_amproc</FileName></TitleAbbrev>
<TGroup Cols="2">
<THead>
<Row>
<Entry>Attribute</Entry>
<Entry>Description</Entry>
</Row>
</THead>
<TBody>
<Row>
<Entry>amopid</Entry>
<Entry>the <FileName>oid</FileName> of the <FileName>pg_am</FileName> instance
for B-tree (== 403, see above)</Entry>
</Row>
<Row>
<Entry>amopclaid</Entry>
<Entry>the <FileName>oid</FileName> of the
<FileName>pg_opclass</FileName> instance for <FileName>complex_abs_ops</FileName>
(== whatever you got instead of <FileName>17314</FileName>, see above)</Entry>
</Row>
<Row>
<Entry>amopopr</Entry>
<Entry>the <FileName>oid</FileName>s of the operators for the opclass
(which we'll get in just a minute)</Entry>
</Row>
<Row>
<Entry>amopselect, amopnpages</Entry>
<Entry>cost functions</Entry>
</Row>
</TBody>
</TGroup>
</TABLE>

The cost functions are used by the query optimizer to decide whether or
not to use a given index in a scan. Fortunately, these already exist.
The two functions we'll use are <FileName>btreesel</FileName>, which
estimates the selectivity of the <Acronym>B-tree</Acronym>, and
<FileName>btreenpage</FileName>, which estimates the number of pages a
search will touch in the tree.
</Para>

<Para>
So we need the <FileName>oid</FileName>s of the operators we just
defined. We'll look up the names of all the operators that take
two <FileName>complex</FileName>es, and pick ours out:

<ProgramListing>
SELECT o.oid AS opoid, o.oprname
INTO TABLE complex_ops_tmp
FROM pg_operator o, pg_type t
WHERE o.oprleft = t.oid and o.oprright = t.oid
and t.typname = 'complex_abs';

+------+---------+
|oid | oprname |
+------+---------+
|17321 | &lt; |
+------+---------+
|17322 | &lt;= |
+------+---------+
|17323 | = |
+------+---------+
|17324 | &gt;= |
+------+---------+
|17325 | &gt; |
+------+---------+
</ProgramListing>

(Again, some of your <FileName>oid</FileName> numbers will almost
certainly be different.) The operators we are interested in are those
with <FileName>oid</FileName>s 17321 through 17325. The values you
get will probably be different, and you should substitute them for the
values below. We will do this with a select statement.
</Para>

<Para>
Now we're ready to update <FileName>pg_amop</FileName> with our new
operator class. The most important thing in this entire discussion
is that the operators are ordered, from less equal through greater
equal, in <FileName>pg_amop</FileName>. We add the instances we need:

<ProgramListing>
INSERT INTO pg_amop (amopid, amopclaid, amopopr, amopstrategy,
amopselect, amopnpages)
SELECT am.oid, opcl.oid, c.opoid, 1,
'btreesel'::regproc, 'btreenpage'::regproc
FROM pg_am am, pg_opclass opcl, complex_abs_ops_tmp c
WHERE amname = 'btree' AND
opcname = 'complex_abs_ops' AND
c.oprname = '<';
</ProgramListing>

Now do this for the other operators substituting for the "1" in the
third line above and the "<" in the last line. Note the order:
"less than" is 1, "less than or equal" is 2, "equal" is 3, "greater
than or equal" is 4, and "greater than" is 5.
</Para>

<Para>
The next step is registration of the "support routine" previously
described in our discussion of <FileName>pg_am</FileName>. The
<FileName>oid</FileName> of this support routine is stored in the
<FileName>pg_amproc</FileName> class, keyed by the access method
<FileName>oid</FileName> and the operator class <FileName>oid</FileName>.
First, we need to register the function in
<ProductName>Postgres</ProductName> (recall that we put the
<Acronym>C</Acronym> code that implements this routine in the bottom of
the file in which we implemented the operator routines):

<ProgramListing>
CREATE FUNCTION complex_abs_cmp(complex, complex)
RETURNS int4
AS 'PGROOT/tutorial/obj/complex.so'
LANGUAGE 'c';

SELECT oid, proname FROM pg_proc
WHERE proname = 'complex_abs_cmp';

+------+-----------------+
|oid | proname |
+------+-----------------+
|17328 | complex_abs_cmp |
+------+-----------------+
</ProgramListing>

(Again, your <FileName>oid</FileName> number will probably be different
and you should substitute the value you see for the value below.)
We can add the new instance as follows:

<ProgramListing>
INSERT INTO pg_amproc (amid, amopclaid, amproc, amprocnum)
SELECT a.oid, b.oid, c.oid, 1
FROM pg_am a, pg_opclass b, pg_proc c
WHERE a.amname = 'btree' AND
b.opcname = 'complex_abs_ops' AND
c.proname = 'complex_abs_cmp';
</ProgramListing>
</Para>

<Para>
Now we need to add a hashing strategy to allow the type to be indexed.
We do this by using another type in pg_am but we reuse the sames ops.

<ProgramListing>
INSERT INTO pg_amop (amopid, amopclaid, amopopr, amopstrategy,
amopselect, amopnpages)
SELECT am.oid, opcl.oid, c.opoid, 1,
'hashsel'::regproc, 'hashnpage'::regproc
FROM pg_am am, pg_opclass opcl, complex_abs_ops_tmp c
WHERE amname = 'hash' AND
opcname = 'complex_abs_ops' AND
c.oprname = '=';
</ProgramListing>
</Para>

<Para>
In order to use this index in a where clause, we need to modify the
<FileName>pg_operator</FileName> class as follows.

<ProgramListing>
UPDATE pg_operator
SET oprrest = 'eqsel'::regproc, oprjoin = 'eqjoinsel'
WHERE oprname = '=' AND
oprleft = oprright AND
oprleft = (SELECT oid FROM pg_type WHERE typname = 'complex_abs');

UPDATE pg_operator
SET oprrest = 'neqsel'::regproc, oprjoin = 'neqjoinsel'
WHERE oprname = '<>' AND
oprleft = oprright AND
oprleft = (SELECT oid FROM pg_type WHERE typname = 'complex_abs');

UPDATE pg_operator
SET oprrest = 'neqsel'::regproc, oprjoin = 'neqjoinsel'
WHERE oprname = '<>' AND
oprleft = oprright AND
oprleft = (SELECT oid FROM pg_type WHERE typname = 'complex_abs');

UPDATE pg_operator
SET oprrest = 'intltsel'::regproc, oprjoin = 'intltjoinsel'
WHERE oprname = '<' AND
oprleft = oprright AND
oprleft = (SELECT oid FROM pg_type WHERE typname = 'complex_abs');

UPDATE pg_operator
SET oprrest = 'intltsel'::regproc, oprjoin = 'intltjoinsel'
WHERE oprname = '<=' AND
oprleft = oprright AND
oprleft = (SELECT oid FROM pg_type WHERE typname = 'complex_abs');

UPDATE pg_operator
SET oprrest = 'intgtsel'::regproc, oprjoin = 'intgtjoinsel'
WHERE oprname = '>' AND
oprleft = oprright AND
oprleft = (SELECT oid FROM pg_type WHERE typname = 'complex_abs');

UPDATE pg_operator
SET oprrest = 'intgtsel'::regproc, oprjoin = 'intgtjoinsel'
WHERE oprname = '>=' AND
oprleft = oprright AND
oprleft = (SELECT oid FROM pg_type WHERE typname = 'complex_abs');
</ProgramListing>
</Para>

<Para>
And last (Finally!) we register a description of this type.

<ProgramListing>
INSERT INTO pg_description (objoid, description)
SELECT oid, 'Two part G/L account'
FROM pg_type WHERE typname = 'complex_abs';
</ProgramListing>
</Para>

</Chapter>

--
D'Arcy J.M. Cain <darcy(at){druid|vex}.net> | Democracy is three wolves
http://www.druid.net/darcy/ | and a sheep voting on
+1 416 424 2871 (DoD#0082) (eNTP) | what's for dinner.

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Tom Lane 1999-05-27 14:02:08 Re: [HACKERS] Uh-oh II - ecpg
Previous Message Tom Lane 1999-05-27 13:56:56 Re: [HACKERS] Help: fmgr_info: function 0: cache lookup failed