From: | Hiroyuki Sato <hiroysato(at)gmail(dot)com> |
---|---|
To: | Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Andreas Kretschmer <andreas(at)a-kretschmer(dot)de> |
Cc: | pgsql-general(at)postgresql(dot)org |
Subject: | Re: grep -f keyword data query |
Date: | 2015-12-27 23:53:38 |
Message-ID: | CA+Tq-RpTFPoM8O5RTdwA8iYAuH4vP4LN=TCFGEV-KCLBUa4-6w@mail.gmail.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-general |
Hello Andreas and Tom
Thank you for replying.
Sorry, I re-created my questions. I was mis-pasted query log on previous
question.
(@~ operator is PGroonga extension (http://pgroonga.github.io))
Please ignore it.
Best regards.
1, Problem.
(1) Following query is exteme slow. (478sec)
SELECT
u.url
FROM
url_lists4 u,
keywords4 k
WHERE
u.url like k.url
AND
k.name = 'esc_url';
(2) grep -f kwd.txt sample.txt (exec time under 1sec)
2, Questions
(1) Is it possible to improve this query like the command ``grep -f
keyword data``?
(2) What kind of Index should I create on url_lists table?
3, Environment
OS: CentOS7
PostgreSQL 9.4
4, sample source
https://github.com/hiroyuki-sato/postgres_like_test
5, Create table
drop table if exists url_lists4;
create table url_lists4 (
id int not null primary key,
url text not null
);
create index ix_url_url_lists4 on url_lists4(url);
drop table if exists keywords4;
create table keywords4 (
id int not null primary key,
name varchar(40) not null,
url text not null
);
create index ix_url_keywords4 on keywords4(url);
create index ix_name_keywords4 on keywords4(name);
\copy url_lists4(id,url) from 'sample.txt' with delimiter ',';
\copy keywords4(id,name,url) from 'keyword.txt' with delimiter ',';
vacuum url_lists4;
vacuum keywords4;
analyze url_lists4;
analyze keywords4;
6, Query
EXPLAIN SELECT
u.url
FROM
url_lists4 u,
keywords4 k
WHERE
u.url like k.url
AND
k.name = 'esc_url';
EXPLAIN ANALYZE SELECT
u.url
FROM
url_lists4 u,
keywords4 k
WHERE
u.url like k.url
AND
k.name = 'esc_url';
SELECT
u.url
FROM
url_lists4 u,
keywords4 k
WHERE
u.url like k.url
AND
k.name = 'esc_url';
7, EXPLAIN
QUERY PLAN
-----------------------------------------------------------------------------
Nested Loop (cost=0.00..37510799.00 rows=12500000 width=57)
Join Filter: (u.url ~~ k.url)
-> Seq Scan on url_lists4 u (cost=0.00..10682.00 rows=500000
width=57)
-> Materialize (cost=0.00..129.50 rows=5000 width=28)
-> Seq Scan on keywords4 k (cost=0.00..104.50 rows=5000
width=28)
Filter: ((name)::text = 'esc_url'::text)
(6 rows)
8, EXPLAIN ANALYZE
QUERY PLAN
-----------------------------------------------------------------------------------------------------------------------------
Nested Loop (cost=0.00..37510799.00 rows=12500000 width=57) (actual
time=6011.642..478011.117 rows=4850 loops=1)
Join Filter: (u.url ~~ k.url)
Rows Removed by Join Filter: 2499995150
-> Seq Scan on url_lists4 u (cost=0.00..10682.00 rows=500000
width=57) (actual time=0.034..192.646 rows=500000 loops=1)
-> Materialize (cost=0.00..129.50 rows=5000 width=28) (actual
time=0.000..0.261 rows=5000 loops=500000)
-> Seq Scan on keywords4 k (cost=0.00..104.50 rows=5000
width=28) (actual time=0.021..1.705 rows=5000 loops=1)
Filter: ((name)::text = 'esc_url'::text)
Planning time: 0.061 ms
Execution time: 478011.773 ms
(9 rows)
2015年12月28日(月) 3:39 Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>:
> Andreas Kretschmer <andreas(at)a-kretschmer(dot)de> writes:
> >> Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> hat am 27. Dezember 2015 um 19:11
> geschrieben:
> >> What in the world is this @~ operator? And what sort of index are
> >> you using now, that can accept it? Are the rowcount estimates in
> >> the EXPLAIN output accurate? (If they are, it's hardly surprising
> >> that the query takes a long time.)
>
> > in a privat mail he called an other operator: ~~. I think, the @~ is an
> error.
>
> Well, ~~ isn't directly indexable by btree indexes either, so there's
> still something wrong with either the EXPLAIN output or the claimed
> index definitions.
>
> regards, tom lane
>
From | Date | Subject | |
---|---|---|---|
Next Message | Tom Lane | 2015-12-28 00:36:47 | Re: grep -f keyword data query |
Previous Message | Andreas Kretschmer | 2015-12-27 22:12:33 | Re: Options for complex materialized views sharing most of the same logic? |