From: | Hiroyuki Sato <hiroysato(at)gmail(dot)com> |
---|---|
To: | Jeff Janes <jeff(dot)janes(at)gmail(dot)com> |
Cc: | Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Andreas Kretschmer <andreas(at)a-kretschmer(dot)de>, "pgsql-general(at)postgresql(dot)org" <pgsql-general(at)postgresql(dot)org> |
Subject: | Re: grep -f keyword data query |
Date: | 2015-12-29 15:21:28 |
Message-ID: | CA+Tq-Rrm_mf_DQzwSjaLmEfoU9Yv9kcXVavxMxH+S9Kt3ZQb5A@mail.gmail.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-general |
Hello Jeff
Thank you for replying.
2015年12月29日(火) 4:35 Jeff Janes <jeff(dot)janes(at)gmail(dot)com>:
> On Sun, Dec 27, 2015 at 3:53 PM, Hiroyuki Sato <hiroysato(at)gmail(dot)com>
> wrote:
> > Hello Andreas and Tom
> >
> > Thank you for replying.
> >
> > Sorry, I re-created my questions. I was mis-pasted query log on previous
> > question.
> > (@~ operator is PGroonga extension (http://pgroonga.github.io))
> > Please ignore it.
> >
> > Best regards.
> >
> > 1, Problem.
> > (1) Following query is exteme slow. (478sec)
> > SELECT
> > u.url
> > FROM
> > url_lists4 u,
> > keywords4 k
> > WHERE
> > u.url like k.url
> > AND
> > k.name = 'esc_url';
> >
> >
> > (2) grep -f kwd.txt sample.txt (exec time under 1sec)
>
> Certainly not in my hands. The best I can get is 9 seconds.
>
> >
> > 2, Questions
> >
> > (1) Is it possible to improve this query like the command ``grep -f
> > keyword data``?
>
> You will not get a general tool to match a specialized tool in the
> specialized tool's own domain. fgrep is good at what fgrep does.
>
This is just same data and real data is forward proxy log.
There are no own domain. It is contain 20,000,000 log per day.
>
>
> Since your queries all have constant text strings at the beginning,
> they could use the index. But if you are not using the C collation,
> then you need build a special index:
>
> create index on url_lists4 (url text_pattern_ops);
>
Currently I just use C collation (ASCII) only.
>
> But, the planner refuses to use this index for your query anyway,
> because it can't see that the patterns are all left-anchored.
>
> Really, your best bet is refactor your url data so it is stored with a
> url_prefix and url_suffix column. Then you can do exact matching
> rather than pattern matching.
>
I see, exact matching faster than pattern matting.
But I need pattern match in path part
(ie, http://www.yahoo.com/a/b/c/... )
I would like to pattern match '/a/b/c' part.
That's why I asked this question.
If it is impossible to improve join speed, I will dump data once, and
match it with grep or something tools.
Thanks.
>
> Cheers,
>
> Jeff
>
From | Date | Subject | |
---|---|---|---|
Next Message | Melvin Davidson | 2015-12-29 19:10:32 | Re: Regex help again (sorry, I am bad at these) |
Previous Message | Hiroyuki Sato | 2015-12-29 15:11:00 | Re: grep -f keyword data query |