From: | Hiroyuki Sato <hiroysato(at)gmail(dot)com> |
---|---|
To: | Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> |
Cc: | Andreas Kretschmer <andreas(at)a-kretschmer(dot)de>, pgsql-general(at)postgresql(dot)org |
Subject: | Re: grep -f keyword data query |
Date: | 2015-12-28 02:00:58 |
Message-ID: | CA+Tq-RqhjYYMopJpcCBDyS6ijvXj_GOVmFX5nijmOkM-V__PVg@mail.gmail.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-general |
Hello Tom.
Thank you for replying.
I re-created index with pg_trgm.
Execution time is 210sec.
Yes It is faster than btree index. But still slow.
It is possible to improve this query speed?
Should I use another query or idex?
Best regards.
1, query
SELECT
u.url
FROM
url_lists4 u,
keywords4 k
WHERE
u.url like k.url
AND
k.name = 'esc_url';
2, create table (gist grgm_ops)
drop table if exists url_lists4;
create table url_lists4 (
id int not null primary key,
url text not null
);
--create index ix_url_url_lists4 on url_lists4(url);
--
-- change index
--
create index ix_url_url_lists4 on url_lists4 using gist(url
gist_trgm_ops);
drop table if exists keywords4;
create table keywords4 (
id int not null primary key,
name varchar(40) not null,
url text not null
);
create index ix_url_keywords4 on keywords4(url);
create index ix_name_keywords4 on keywords4(name);
\copy url_lists4(id,url) from 'sample.txt' with delimiter ',';
\copy keywords4(id,name,url) from 'keyword.txt' with delimiter ',';
vacuum url_lists4;
vacuum keywords4;
analyze url_lists4;
analyze keywords4;
3, EXPLAIN
QUERY PLAN
-------------------------------------------------------------------------------------------------
Nested Loop (cost=0.41..432684.50 rows=12500000 width=57)
-> Seq Scan on keywords4 k (cost=0.00..104.50 rows=5000 width=28)
Filter: ((name)::text = 'esc_url'::text)
-> Index Scan using ix_url_url_lists4 on url_lists4 u
(cost=0.41..61.52 rows=2500 width=57)
Index Cond: (url ~~ k.url)
(5 rows)
4, EXPLAIN ANALYZE
QUERY PLAN
------------------------------------------------------------------------------------------------------------------------------------------------
Nested Loop (cost=0.41..432684.50 rows=12500000 width=57) (actual
time=1048.428..210641.655 rows=4850 loops=1)
-> Seq Scan on keywords4 k (cost=0.00..104.50 rows=5000 width=28)
(actual time=0.033..28.878 rows=5000 loops=1)
Filter: ((name)::text = 'esc_url'::text)
-> Index Scan using ix_url_url_lists4 on url_lists4 u
(cost=0.41..61.52 rows=2500 width=57) (actual time=41.426..42.115 rows=1
loops=5000)
Index Cond: (url ~~ k.url)
Rows Removed by Index Recheck: 0
Planning time: 0.138 ms
Execution time: 210642.896 ms
(8 rows)
2015年12月28日(月) 9:36 Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>:
> Hiroyuki Sato <hiroysato(at)gmail(dot)com> writes:
> > Sorry, I re-created my questions. I was mis-pasted query log on previous
> > question.
> > (@~ operator is PGroonga extension (http://pgroonga.github.io))
> > [ "like" is actually the operator of interest ]
>
> Ah. You might get some good results with trigram indexes (ie,
> contrib/pg_trgm and a GIST or GIN index). See
>
> http://www.postgresql.org/docs/9.4/static/pgtrgm.html
>
> regards, tom lane
>
From | Date | Subject | |
---|---|---|---|
Next Message | Andreas Kretschmer | 2015-12-28 11:23:16 | Re: Calling function (table_name, schema_name) within event trigger |
Previous Message | Tom Lane | 2015-12-28 00:36:47 | Re: grep -f keyword data query |