SP-GiST confusion: indexed column's type vs. index column type

From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: pgsql-hackers(at)lists(dot)postgresql(dot)org
Cc: Pavel Borisov <pashkin(dot)elfe(at)gmail(dot)com>
Subject: SP-GiST confusion: indexed column's type vs. index column type
Date: 2021-04-02 16:37:51
Message-ID: 3728741.1617381471@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

While reviewing Pavel Borisov's patch to enable INCLUDE columns in
SP-GiST, I found some things that seem like pre-existing bugs.
These only accidentally fail to cause any problems in the existing
SP-GiST opclasses:

1. The attType passed to an opclass's config method is documented as

Oid attType; /* Data type to be indexed */

Now, I would read that as meaning the type of the underlying heap
column; the documentation and code about when a "compress" method
is required certainly seem to think so too. What is actually being
passed, though, is the data type of the index column, that is the
"opckeytype" of the index opclass. We've failed to notice this because
(1) for most of the core SP-GiST opclasses, the two types are the same,
and (2) none of the core opclasses bother to examine attType anyway.

2. When performing an index-only scan on an SP-GiST index, what we
pass back as the tuple descriptor of the output tuples is just the
index relation's own tupdesc, cf spgbeginscan:

/* Set up indexTupDesc and xs_hitupdesc in case it's an index-only scan */
so->indexTupDesc = scan->xs_hitupdesc = RelationGetDescr(rel);

Again, what this is going to report is the opckeytype, not the
reconstructed heap column datatype. That's just flat out wrong.
We've failed to notice because the only core opclass for which
those types are different is poly_ops, which does not support
reconstructing the polygons for index-only scan.

We need to do something about this because the INCLUDE patch needs
the relation descriptor of an SP-GiST index to match the reality
of what is stored in the leaf tuples. Right now, as far as I can tell,
there isn't really any necessary connection between the atttype
claimed by the relation descriptor and the leaf type that's physically
stored. They're accidentally the same in all existing opclasses,
but only accidentally.

I propose changing things so that

(A) attType really is the input (heap) data type.

(B) We enforce that leafType agrees with the opclass opckeytype,
ensuring the index tupdesc can be used for leaf tuples.

(C) The tupdesc passed back for an index-only scan reports the
input (heap) data type.

This might be too much change for the back branches. Given the
lack of complaints to date, I think we can just fix it in HEAD.

regards, tom lane

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Peter Geoghegan 2021-04-02 16:46:28 Re: SP-GiST confusion: indexed column's type vs. index column type
Previous Message Dmitry Dolgov 2021-04-02 16:06:36 Re: Asynchronous and "direct" IO support for PostgreSQL.