Forcing the right queryplan

From: Henk van Lingen <H(dot)G(dot)K(dot)vanLingen(at)uu(dot)nl>
To: pgsql-general(at)postgresql(dot)org
Subject: Forcing the right queryplan
Date: 2010-08-31 14:50:09
Message-ID: 20100831145009.GD22680@uu.nl
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-general pgsql-novice


Hi,

I've the problem my database is not using the 'right' queryplan in all
cases. Is there a way I can force that and/or how should I tuned the
table statistics?

I'm doing a rsyslog database in PostgreSQL with millions of records
(firewall logging). The db scheme is the so called 'MonitorWare' scheme,
to wich I added two extra indexes.

syslog=# select version();
version

--------------------------------------------------------------------------------
----------------------------------
PostgreSQL 8.4.4 on x86_64-redhat-linux-gnu, compiled by GCC gcc (GCC) 4.1.2 20
080704 (Red Hat 4.1.2-48), 64-bit

syslog=# \d systemevents
Table "public.systemevents"
Column | Type | Modi
fiers
--------------------+-----------------------------+-----------------------------
------------------------------
id | integer | not null default nextval('sy
stemevents_id_seq'::regclass)
customerid | bigint |
receivedat | timestamp without time zone |
devicereportedtime | timestamp without time zone |
facility | smallint |
priority | smallint |
fromhost | character varying(60) |
message | text |
ntseverity | integer |
importance | integer |
eventsource | character varying(60) |
eventuser | character varying(60) |
eventcategory | integer |
eventid | integer |
eventbinarydata | text |
maxavailable | integer |
currusage | integer |
minusage | integer |
maxusage | integer |
infounitid | integer |
syslogtag | character varying(60) |
eventlogtype | character varying(60) |
genericfilename | character varying(60) |
systemid | integer |
Indexes:
"systemevents_pkey" PRIMARY KEY, btree (id)
"fromhost_idx" btree (fromhost)
"msgs_idx" gin (to_tsvector('english'::regconfig, message))

The GIN index is to do text searching (via LogAnalyzer).

Now there are two types of query plans:

syslog=# explain SELECT id, devicereportedtime, facility, priority, fromhost, syslogtag, infounitid, message FROM systemevents WHERE ( ( to_tsvector('english', message) @@ to_tsquery ( '131.211.112.9')) ) ORDER BY id DESC LIMIT 100; QUERY PLAN

--------------------------------------------------------------------------------
---------------------------------
Limit (cost=0.00..10177.22 rows=100 width=159)
-> Index Scan Backward using systemevents_pkey on systemevents (cost=0.00..
1052934.86 rows=10346 width=159)
Filter: (to_tsvector('english'::regconfig, message) @@ to_tsquery('131.
211.112.9'::text))
(3 rows)

This one is useless (takes very long). However this one:

syslog=# explain SELECT id, devicereportedtime, facility, priority, fromhost, syslogtag, infounitid, message FROM systemevents WHERE ( ( to_tsvector('english', message) @@ to_tsquery ( '131.211.112.9')) ) ORDER BY id DESC LIMIT 500;
QUERY PLAN

--------------------------------------------------------------------------------
-----------------------------------
Limit (cost=40928.89..40930.14 rows=500 width=159)
-> Sort (cost=40928.89..40954.76 rows=10346 width=159)
Sort Key: id
-> Bitmap Heap Scan on systemevents (cost=2898.06..40413.36 rows=1034
6 width=159)
Recheck Cond: (to_tsvector('english'::regconfig, message) @@ to_t
squery('131.211.112.9'::text))
-> Bitmap Index Scan on msgs_idx (cost=0.00..2895.47 rows=10346
width=0)
Index Cond: (to_tsvector('english'::regconfig, message) @@
to_tsquery('131.211.112.9'::text))
(7 rows)

works acceptable.

Stats:

syslog=# SELECT relname, relkind, reltuples, relpages
FROM pg_class WHERE relname LIKE 'systemevents%'; relname | relkind | reltuples | relpages
-------------------------------+---------+-------------+----------
systemevents_pkey | i | 2.06915e+06 | 71985
systemeventsproperties | r | 0 | 0
systemeventsproperties_pkey | i | 0 | 1
systemevents_id_seq | S | 1 | 1
systemeventsproperties_id_seq | S | 1 | 1
systemevents | r | 2.06915e+06 | 694826
(6 rows)

syslog=# SELECT relname, relkind, reltuples, relpages
FROM pg_class
WHERE relname LIKE 'msg%';
relname | relkind | reltuples | relpages
----------+---------+-------------+----------
msgs_idx | i | 2.06915e+06 | 128069
(1 row)

How to use the right plan regardless of the 'LIMIT-size'?

Cheers,
--
Henk van Lingen, ICT-SC Netwerk & Telefonie, (o- -+
Universiteit Utrecht, Jenalaan 18a, room 0.12 /\ |
phone: +31-30-2538453 v_/_ |
http://henk.vanlingen.net/ http://www.tuxtown.net/netiquette/

Responses

Browse pgsql-general by date

  From Date Subject
Next Message Allan Kamau 2010-08-31 14:58:47 Finding intercept of two documents (two tsvector fields)
Previous Message Tom Lane 2010-08-31 14:39:30 Re: pg_dump --compress error

Browse pgsql-novice by date

  From Date Subject
Next Message Tom Lane 2010-08-31 15:01:54 Re: Reduce Calculations in SELECT
Previous Message Josh Kupershmidt 2010-08-31 14:48:14 Re: Reduce Calculations in SELECT