Quick Links

Re: Index AM change proposals, redux

From:	Heikki Linnakangas <heikki(at)enterprisedb(dot)com>
To:	Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc:	pgsql-hackers(at)postgreSQL(dot)org
Subject:	Re: Index AM change proposals, redux
Date:	2008-04-10 05:38:05
Message-ID:	47FDA7BD.3030404@enterprisedb.com
Views:	Whole Thread \| Raw Message \| Download mbox \| Resend email
Thread:
Lists:	pgsql-hackers

Tom Lane wrote:
> * Proposed change to let both amgetnext and amgetmulti mark returned
> tuples as "candidate matches", that is in need of rechecking of quals
> against the real heap tuple. I had originally objected to this on
> the grounds that it would require setup work that doesn't happen now,
> but looking at the code I see that's wrong --- the setup work *does*
> happen anyway, see indexqualorig. nodeIndexscan.c comments "The
> indexqualorig expression is always initialized even though it will only
> be used in some uncommon cases --- would be nice to improve that".
> So this complaint is probably a red herring. I'm still a bit concerned
> by the fact that the patch only tracks this on a page basis in the
> amgetmulti case: if any of the tuples on a page are in-doubt then
> everything on the page will have to be rechecked. Is that good enough?

I think it's good enough. If we wanted to track it on a per-tuple basis,
we'd need to store two bits per tuple, AFAICS, doubling the size of the
bitmaps.

> A bigger issue is whether this is worth applying when nobody seems to be
> working on either of the main uses for it (bitmap indexes and GIT
> indexes). There seemed to be some possible marginal use for it in GIST
> indexes, but I'm not convinced that's a sufficient reason to complicate
> the APIs.

It has some merit on its own. Consider the case where you bitmap AND a
lossy page and a non-lossy one. We currently make the result lossy,
because we can't know which tuples on the page match, and then we need
to recheck all tuples on that page. With the support for candidate
matches, we could instead keep the non-lossy version of the bitmap and
mark the matches as candidates.

In the very best case, we might even apply a further AND to the page,
which eliminates all the remaining candidate matches, and skip the heap
access of that page altogether.

> * Proposed change to let amgetnext mark returned tuples as not
> necessarily in order, requiring somebody downstream to sort them again.
> I was pretty desperately unhappy with that because it required injection
> of sorting knowledge into code that really shouldn't have anything to do
> with it --- not least because not all indexscans even have a defined
> ordering. Given that it is only needed for one particular possible
> implementation of GIT, and Heikki was leaning away from that
> implementation anyway at last report, I vote against considering this
> any further.

Agreed, there's no use for that without GIT.

> * Proposed changes to refactor the TIDBitmap support to include a
> concept of a "stream bitmap" being read from disk. (Not strictly an
> index AM change, but closely related.) While this is clean and nice on
> its own, it doesn't seem to have any use unless bitmap indexes happen.
> If we leave the code sit it'll probably acquire bit rot, but I'm
> disinclined to add a bunch of unused code, too. Thoughts?

If the code is unused, it can easily bit rot even if it's in CVS. Let's
not apply it. The patch is out there if/when someone picks up the bitmap
index patch again.

> * Proposed changes to allow amgetnext to return the actual index keys,
> allowing certain types of "unindexable" quals to be checked without
> having to fetch the heap entry. For example a LIKE condition could be
> fully checked against the index entry. Since certain types of indexes
> (GIN now, possibly hash in future) are incapable of doing this, there'd
> need to be a way of marking index AMs (or perhaps individual indexes) as
> able or not able to return their keys, so that the planner could know
> whether quals could be pushed into the indexscan.
>
> We don't have any actual submitted patch for this, but AFAIR everyone
> agrees it's a good idea.

Agreed :-).

> * GIT (Grouped Index Tuple) indexes, which achieve index space savings
> in btrees by having a single index tuple represent multiple heap tuples
> (on a single heap page) containing a range of key values. I am not sure
> what the development status is --- Heikki had submitted a completed
> patch but there seemed to be agreement on making changes, and that's not
> been done AFAIK. The really serious problem I've got with it is that
> it'd foreclose the possibility of returning actual index keys from btree
> indexes, thus basically killing the usefulness of that idea. I'm not
> convinced it would offer enough gain to be worth paying that price.
> Another issue is that we'd need to check how much of the use-case for
> GIT has been taken over by HOT.

I don't think there's much overlap with HOT. On the contrary, HOT makes
GIT much more useful, because GIT was very sensitive to the cluster
order of the heap, and HOT helps with that.

There is, however, a ton of overlap with index-only scans, and the
possibility to return keys from indexes, as you pointed out.

--
Heikki Linnakangas
EnterpriseDB http://www.enterprisedb.com

In response to

Index AM change proposals, redux at 2008-04-10 00:30:49 from Tom Lane

Responses

Re: Index AM change proposals, redux at 2008-04-10 17:18:28 from Tom Lane
Re: Index AM change proposals, redux at 2008-04-12 00:08:21 from Ron Mayer

Browse pgsql-hackers by date

	From	Date	Subject
Next Message	Brendan Jurd	2008-04-10 05:52:57	Re: Commit fest queue
Previous Message	Tom Lane	2008-04-10 05:24:36	Re: Commit fest queue