Performance ts_vector fulltext search

From: Luigi Saggese <luigisag(at)gmail(dot)com>
To: pgsql-performance(at)postgresql(dot)org
Subject: Performance ts_vector fulltext search
Date: 2013-04-11 07:47:20
Message-ID: CAGxgJgG6Uqjx5WjzOc-_uODq00MiRPJP=ja25FvMr1pU8s4XnQ@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-performance

I've configured 2 table like this

CREATE TABLE "public"."User_Statement_Pivot" (
"Email" varchar(50),
"UserId" varchar(50),
"ShortId" varchar(50),
"LastDirectJobMailSentDateTime" int8,
"What" varchar(4096),
"Where" varchar(4096)
)
WITH (OIDS=FALSE)
;

ALTER TABLE "public"."User_Statement_Pivot" OWNER TO "postgres";

CREATE INDEX "IX_btree_usp_userIdShortIdEmailLastDJMSDT" ON
"public"."User_Statement_Pivot" USING btree ("UserId", "ShortId", "Email",
"LastDirectJobMailSentDateTime");

CREATE INDEX "ix_fulltext_usp_what" ON "public"."User_Statement_Pivot"
("to_tsvector('italian'::regconfig, ""What""::text)",
"to_tsvector('italian'::regconfig, ""What""::text)");

CREATE INDEX "ix_fulltext_usp_what_en" ON "public"."User_Statement_Pivot"
("to_tsvector('english'::regconfig, ""What""::text)",
"to_tsvector('english'::regconfig, ""What""::text)");

CREATE INDEX "ix_fulltext_usp_where" ON "public"."User_Statement_Pivot"
("to_tsvector('italian'::regconfig, ""Where""::text)",
"to_tsvector('italian'::regconfig, ""Where""::text)");

CREATE INDEX "ix_usp_what" ON "public"."User_Statement_Pivot" USING btree
("What");

CREATE TABLE "public"."User_Statement_Pivot_2" (
"Email" varchar(50),
"UserId" varchar(50),
"ShortId" varchar(50),
"LastDirectJobMailSentDateTime" int8,
"Where" varchar(4096),
"tsv" tsvector
)
WITH (OIDS=FALSE)
;

ALTER TABLE "public"."User_Statement_Pivot_2" OWNER TO "postgres";

CREATE INDEX "IX_btree_usp2_userIdShortIdEmailLastDJMSDT" ON
"public"."User_Statement_Pivot_2" USING btree ("UserId", "ShortId",
"Email", "LastDirectJobMailSentDateTime");

CREATE INDEX "textsearch_tsv" ON "public"."User_Statement_Pivot_2" ("tsv");

CREATE TRIGGER "tsvectorupdate" BEFORE INSERT OR UPDATE ON
"public"."User_Statement_Pivot_2"
FOR EACH ROW
EXECUTE PROCEDURE "tsvector_update_trigger"('tsv', 'pg_catalog.italian',
'What');

Column "What" (table User_Statement_Pivot is just a single word or max 2
words separeted by space " " (ex: programmatore .NET), and tsv
(table User_Statement_Pivot_2) is populate by materializing column with a
ts_vector of "What".

Now if i perform those 2 queries

SELECT * FROM "User_Statement_Pivot_2"
where tsv @@ to_tsquery('italian','programmatore|analista')

SELECT * FROM "User_Statement_Pivot"
where to_tsvector('italian', tsv) @@
to_tsquery('italian','programmatore|analista')

Records on Tables (are same) like 8 milion.

Execution time of 1st query is 2 seconds (result set like 13.027)
Execution time of 2st query is 3 seconds (result set like 13.027) same
records

Those are query analize

Bitmap Heap Scan on "User_Statement_Pivot" (cost=1025.27..109801.47
rows=76463 width=88) (actual time=3.186..12.608 rows=13027 loops=1)
Recheck Cond: (to_tsvector('italian'::regconfig, ("What")::text) @@
'''programm'' | ''anal'''::tsquery)
-> Bitmap Index Scan on ix_fulltext_usp_what (cost=0.00..1006.16
rows=76463 width=0) (actual time=2.315..2.315 rows=13027 loops=1)
Index Cond: (to_tsvector('italian'::regconfig, ("What")::text) @@
'''programm'' | ''anal'''::tsquery)
Total runtime: 12.972 ms

Bitmap Heap Scan on "User_Statement_Pivot_2" (cost=205.46..43876.92
rows=15068 width=102) (actual time=3.135..18.141 rows=13027 loops=1)
Recheck Cond: (tsv @@ '''programm'' | ''anal'''::tsquery)
-> Bitmap Index Scan on textsearch_tsv (cost=0.00..201.69 rows=15068
width=0) (actual time=2.254..2.254 rows=13027 loops=1)
Index Cond: (tsv @@ '''programm'' | ''anal'''::tsquery)
Total runtime: 18.502 ms

Configuration
PostgreSQL 9.0.4 on x86_64-unknown-linux-gnu, compiled by GCC gcc (GCC)
4.1.2 20080704 (Red Hat 4.1.2-46), 64-bit

If i increase words number in to_tsquery in OR condition those 2 queries
are more different (exponentially). I don't understand why a materialized
column is more slow than a calculeted one.

--
Luigi Saggese
Analyst Developer

*Work:* +39 328 75 16 236
*Email:* luigisag(at)gmail(dot)com
*IM:* luigisaggese (Skype)
*http://it.linkedin.com/in/luigisaggese*

Responses

Browse pgsql-performance by date

  From Date Subject
Next Message Steve Singer 2013-04-11 14:20:11 Re: slow bitmap heap scans on pg 9.2
Previous Message Steve Singer 2013-04-10 23:54:09 Re: slow bitmap heap scans on pg 9.2