From: | ERR ORR <rd0002(at)gmail(dot)com> |
---|---|
To: | "pgsql-general(at)postgresql(dot)org" <pgsql-general(at)postgresql(dot)org> |
Subject: | Question on Trigram GIST indexes |
Date: | 2013-01-05 18:20:41 |
Message-ID: | CALtFtELEvreyXn3fgk9GPWDHvrRXNeVQsSgpuicZZL_oQ4oHmw@mail.gmail.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-general |
@Moderators: I am reposting this because the original from 22 December
apparently didn't arrive on the list.
I was trying to make Postgresql use a trigram gist index on a varchar
field, but to no avail.
Specifically, I was trying to replicate what is done in this blog post:
http://www.postgresonline.com/journal/archives/212-PostgreSQL-9.1-Trigrams-teaching-LIKE-and-ILIKE-new-tricks.html
I use Postgresql 9.1.7 on Linux FC17 64bit, my locale is UTF8.
My full table definition is
CREATE TABLE "TEST"
(
"RECID" bigint NOT NULL DEFAULT next_id(),
"TST_PAYLOAD" character varying(255),
CONSTRAINT "PK_TEST" PRIMARY KEY ("RECID")
USING INDEX TABLESPACE local
)
WITH (
OIDS=FALSE
);
CREATE INDEX "TEST_PAYLOAD_PATTERN_1_IDX"
ON "TEST"
USING btree
("TST_PAYLOAD" COLLATE pg_catalog."default" varchar_pattern_ops)
TABLESPACE local;
CREATE INDEX "TEST_PAYLOAD_TRIGRAM_GIST_1_IDX"
ON "TEST"
USING gist
("TST_PAYLOAD" COLLATE pg_catalog."default" gist_trgm_ops)
TABLESPACE local;
CREATE INDEX "TEST_PAYLOAD_TRIGRAM_GIN_1_IDX"
ON "TEST"
USING gin
("TST_PAYLOAD" COLLATE pg_catalog."default" gin_trgm_ops)
TABLESPACE local;
The COLLATE pg_catalog."default" clause is inserted by the DB (default is
"Unicode"). I also tried to define the Trigram index with COLLATE
pg_catalog."C" but the behavior did not change. I did vacuum and analyze
after creating each index.
The field "TST_PAYLOAD" contains 26389 names of cities, all in uppercase.
I have pg_tgrm installed - actually all extensions are present.
Queries which use "WHERE "TST_PAYLOAD" LIKE 'SEAT%'" go to the btree index
as it should.
Queries which use "WHERE "TST_PAYLOAD" LIKE '%EAT%'" *should* use the GIST
index but do a full table scan instead.
(I am looking for names like 'SEATTLE' in this example)
I also tried dropping the btree index but that has no influence on the
behavior.
I have texts/strings in different languages/charsets, so UTF8 looked like
the best decision to me, instead of, say, ISO-8859-15, which is limited to
just some European charsets. Specifically I am storing strings in European
languages (corresponding to the ISO-8859 series) including diacrites line
äöüñáéíóú ..., Russian, Arabic, Chinese etc. in one column instead of
making different columns/tables and using them via a view because that's my
use case and UTF8 should accommodate that IMHO (or is that an abuse of the
DB?)
Would it help to `ALTER DATABASE set lc_collate = 'C'`,supposing that is
possible? (Oracle doesn't allow that iirc).
Thanks for any insights, pointers ...
I'd be grateful if anybody could explain to me what I am doing wrong.
Thanks in advance.
From | Date | Subject | |
---|---|---|---|
Next Message | Chris Travers | 2013-01-06 02:18:23 | Counterintuitive locking behavior |
Previous Message | John R Pierce | 2013-01-05 05:53:40 | Re: recasting to timestamp from varchar |