From: | Luigi Saggese <luigisag(at)gmail(dot)com> |
---|---|
To: | pgsql-performance(at)postgresql(dot)org |
Subject: | Performance ts_vector fulltext search |
Date: | 2013-04-11 07:47:20 |
Message-ID: | CAGxgJgG6Uqjx5WjzOc-_uODq00MiRPJP=ja25FvMr1pU8s4XnQ@mail.gmail.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-performance |
I've configured 2 table like this
CREATE TABLE "public"."User_Statement_Pivot" (
"Email" varchar(50),
"UserId" varchar(50),
"ShortId" varchar(50),
"LastDirectJobMailSentDateTime" int8,
"What" varchar(4096),
"Where" varchar(4096)
)
WITH (OIDS=FALSE)
;
ALTER TABLE "public"."User_Statement_Pivot" OWNER TO "postgres";
CREATE INDEX "IX_btree_usp_userIdShortIdEmailLastDJMSDT" ON
"public"."User_Statement_Pivot" USING btree ("UserId", "ShortId", "Email",
"LastDirectJobMailSentDateTime");
CREATE INDEX "ix_fulltext_usp_what" ON "public"."User_Statement_Pivot"
("to_tsvector('italian'::regconfig, ""What""::text)",
"to_tsvector('italian'::regconfig, ""What""::text)");
CREATE INDEX "ix_fulltext_usp_what_en" ON "public"."User_Statement_Pivot"
("to_tsvector('english'::regconfig, ""What""::text)",
"to_tsvector('english'::regconfig, ""What""::text)");
CREATE INDEX "ix_fulltext_usp_where" ON "public"."User_Statement_Pivot"
("to_tsvector('italian'::regconfig, ""Where""::text)",
"to_tsvector('italian'::regconfig, ""Where""::text)");
CREATE INDEX "ix_usp_what" ON "public"."User_Statement_Pivot" USING btree
("What");
CREATE TABLE "public"."User_Statement_Pivot_2" (
"Email" varchar(50),
"UserId" varchar(50),
"ShortId" varchar(50),
"LastDirectJobMailSentDateTime" int8,
"Where" varchar(4096),
"tsv" tsvector
)
WITH (OIDS=FALSE)
;
ALTER TABLE "public"."User_Statement_Pivot_2" OWNER TO "postgres";
CREATE INDEX "IX_btree_usp2_userIdShortIdEmailLastDJMSDT" ON
"public"."User_Statement_Pivot_2" USING btree ("UserId", "ShortId",
"Email", "LastDirectJobMailSentDateTime");
CREATE INDEX "textsearch_tsv" ON "public"."User_Statement_Pivot_2" ("tsv");
CREATE TRIGGER "tsvectorupdate" BEFORE INSERT OR UPDATE ON
"public"."User_Statement_Pivot_2"
FOR EACH ROW
EXECUTE PROCEDURE "tsvector_update_trigger"('tsv', 'pg_catalog.italian',
'What');
Column "What" (table User_Statement_Pivot is just a single word or max 2
words separeted by space " " (ex: programmatore .NET), and tsv
(table User_Statement_Pivot_2) is populate by materializing column with a
ts_vector of "What".
Now if i perform those 2 queries
SELECT * FROM "User_Statement_Pivot_2"
where tsv @@ to_tsquery('italian','programmatore|analista')
SELECT * FROM "User_Statement_Pivot"
where to_tsvector('italian', tsv) @@
to_tsquery('italian','programmatore|analista')
Records on Tables (are same) like 8 milion.
Execution time of 1st query is 2 seconds (result set like 13.027)
Execution time of 2st query is 3 seconds (result set like 13.027) same
records
Those are query analize
Bitmap Heap Scan on "User_Statement_Pivot" (cost=1025.27..109801.47
rows=76463 width=88) (actual time=3.186..12.608 rows=13027 loops=1)
Recheck Cond: (to_tsvector('italian'::regconfig, ("What")::text) @@
'''programm'' | ''anal'''::tsquery)
-> Bitmap Index Scan on ix_fulltext_usp_what (cost=0.00..1006.16
rows=76463 width=0) (actual time=2.315..2.315 rows=13027 loops=1)
Index Cond: (to_tsvector('italian'::regconfig, ("What")::text) @@
'''programm'' | ''anal'''::tsquery)
Total runtime: 12.972 ms
Bitmap Heap Scan on "User_Statement_Pivot_2" (cost=205.46..43876.92
rows=15068 width=102) (actual time=3.135..18.141 rows=13027 loops=1)
Recheck Cond: (tsv @@ '''programm'' | ''anal'''::tsquery)
-> Bitmap Index Scan on textsearch_tsv (cost=0.00..201.69 rows=15068
width=0) (actual time=2.254..2.254 rows=13027 loops=1)
Index Cond: (tsv @@ '''programm'' | ''anal'''::tsquery)
Total runtime: 18.502 ms
Configuration
PostgreSQL 9.0.4 on x86_64-unknown-linux-gnu, compiled by GCC gcc (GCC)
4.1.2 20080704 (Red Hat 4.1.2-46), 64-bit
If i increase words number in to_tsquery in OR condition those 2 queries
are more different (exponentially). I don't understand why a materialized
column is more slow than a calculeted one.
--
Luigi Saggese
Analyst Developer
*Work:* +39 328 75 16 236
*Email:* luigisag(at)gmail(dot)com
*IM:* luigisaggese (Skype)
*http://it.linkedin.com/in/luigisaggese*
From | Date | Subject | |
---|---|---|---|
Next Message | Steve Singer | 2013-04-11 14:20:11 | Re: slow bitmap heap scans on pg 9.2 |
Previous Message | Steve Singer | 2013-04-10 23:54:09 | Re: slow bitmap heap scans on pg 9.2 |