Re: [GENERAL] Query Optimisation

From: Stephan Szabo <sszabo(at)megazone(dot)bigpanda(dot)com>
To: <psql-mail(at)freeuk(dot)com>
Cc: <pgsql-performance(at)postgresql(dot)org>
Subject: Re: [GENERAL] Query Optimisation
Date: 2003-07-15 16:09:19
Message-ID: 20030715090339.K95323-100000@megazone.bigpanda.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-general pgsql-performance


[replying to -performance]

On Tue, 15 Jul 2003 psql-mail(at)freeuk(dot)com wrote:

> Hi, I have the following query - is there anything i've missed or is it
> just slow?!

The fact that it underestimates the number of matching message rows by a
factor of about 4000 doesn't help. I'm not sure you're going to be able to
get a better estimate using message.header||message.body ILIKE '%chicken%'
(possibly using two ilikes with or might help but probably not enough).
Have you vacuum analyzed the two tables recently? The seq scan cost on
message seems fairly low given what I would expect to be the size of that
table.

> I am currently in the process of setting up full text indexing as
> described in the techdocs.postgresql.org i guess this is the main way
> of speeding up searches through ~40GB of bulk text?

That's still probably the best way.

> EXPLAIN ANALYZE SELECT meta.msg_id, date, from_line, subject FROM
> message ,meta WHERE meta.date >= '15-06-2003 00:00:00' AND meta.date <=
> '26-06-2003 00:00:00' AND message.header||message.body ILIKE '%chicken%'
> AND meta.sys_id = message.sys_id ORDER BY meta.date DESC;
> QUERY PLAN
>
> ------------------------------------------------------------------------
> ------------------------------------------------------------------------
> ------
> Sort (cost=50.16..50.17 rows=1 width=120) (actual time=412333.65..
> 412333.76 rows=166 loops=1)
> Sort Key: meta.date
> -> Nested Loop (cost=0.00..50.15 rows=1 width=120) (actual time=
> 400713.41..412332.53 rows=166 loops=1)
> -> Seq Scan on message (cost=0.00..25.00 rows=5 width=8) (actual time=
> 58.18..410588.49 rows=20839 loops=1)
> Filter: ((header || body) ~~* '%chicken%'::text)
> -> Index Scan using meta_pkey on meta (cost=0.00..5.02 rows=1 width=
> 112) (actual time=0.07..0.07 rows=0 loops=20839)
> Index Cond: (meta.sys_id = "outer".sys_id)
> Filter: ((date >= '2003-06-15 00:00:00'::timestamp without time zone)
> AND (date <= '2003-06-26 00:00:00'::timestamp without time zone))
> Total runtime: 412334.08 msec
> (9 rows)
>
> ---------------------------(end of broadcast)---------------------------
> TIP 6: Have you searched our list archives?
>
> http://archives.postgresql.org
>

In response to

Browse pgsql-general by date

  From Date Subject
Next Message Dmitry Tkach 2003-07-15 16:15:42 INSTEAD rule bug?
Previous Message Tom Lane 2003-07-15 16:03:53 Re: Billions of records?

Browse pgsql-performance by date

  From Date Subject
Next Message Jonathan Bartlett 2003-07-15 16:21:26 Re: Query Optimisation
Previous Message psql-mail 2003-07-15 15:59:31 Query Optimisation