Quick Links

Re: spgist text_ops and LIKE

From:	Alexander Korotkov <aekorotkov(at)gmail(dot)com>
To:	Robert Haas <robertmhaas(at)gmail(dot)com>
Cc:	Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, pgsql-hackers(at)postgresql(dot)org, Teodor Sigaev <teodor(at)sigaev(dot)ru>, Oleg Bartunov <oleg(at)sai(dot)msu(dot)su>
Subject:	Re: spgist text_ops and LIKE
Date:	2012-02-02 19:07:21
Message-ID:	CAPpHfduR4898K+heSgBtTyiNoUF1fqOdo3KTHvmnQzf1CFU69g@mail.gmail.com
Views:	Whole Thread \| Raw Message \| Download mbox \| Resend email
Thread:
Lists:	pgsql-hackers

On Thu, Feb 2, 2012 at 10:21 PM, Robert Haas <robertmhaas(at)gmail(dot)com> wrote:

> I'm having trouble figuring out under what set of circumstances spgist
> is expected to be the best available alternative. It only supports a
> small subset of the data types that GiST does, so I suppose the point
> is that it should be faster for the cases that it does handle. And,
> considering that this is all brand-new code, the fact that it's almost
> keeping pace with btree on both pattern-matching and equality
> comparisons is certainly respectable -- but I so far haven't found any
> cases where it's a clear win. There's limited glory in being the
> almost-fastest way of indexing for a certain class of queries.
>

I think that current implementation of suffix tree (in particular
implementation of consistent methods) in spgist is far from reveal full
potential of suffix tree. I see at least two additional search types which
can be efficiently evaluated using suffix tree:
1) Search by prefix regular expression. For example, search for
/^a(bc|d)e[fgh]/ in single index scan.
2) Search by levenshtein distance, i.e. "SELECT * FROM tbl WHERE
levenshtein(col, 'some constant') < k;" using index (surely actual syntax
for such index scan would be anouther).
This is only two additional applications which first comes to my mind,
there could be other. Unfortunately, I don't have enough of time for it
now. But I like to put my hands on this in future.

Admittedly, I haven't tried the point-in-box stuff yet.
>

In short, I believe spgist for geometrical data-types is more CPU-effective
and less IO-effective than gist. Gist have to call consistent method for
all index tuples of the page it uses in scan, while spgist scans smaller
nodes. Sure, comprehensive testing is required for doing final conclusion
about that.

-----
With best regards,
Alexander Korotkov.

In response to

Re: spgist text_ops and LIKE at 2012-02-02 18:21:40 from Robert Haas

Browse pgsql-hackers by date

	From	Date	Subject
Next Message	Magnus Hagander	2012-02-02 19:14:12	Re: keywords in pg_hba.conf are case-sensitive?
Previous Message	Tom Lane	2012-02-02 19:02:05	Re: spgist text_ops and LIKE