From: | Ron Mayer <ron(at)intervideo(dot)com> |
---|---|
To: | <pgsql-bugs(at)postgresql(dot)org> |
Cc: | Ron Mayer <ron(at)intervideo(dot)com> |
Subject: | Inconsistant use of index. |
Date: | 2002-03-26 17:06:39 |
Message-ID: | Pine.LNX.4.33.0203260845110.16667-100000@ron |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-bugs |
In porting a pretty large (10s of millions of records) data warehouse
from Oracle to PostgreSQL,
Once some of my tables started getting pretty large, PostgreSQL
suddenly stopped using indexes when I use expressions like "col = value"
decreasing performance by 20X. This meant that my daily reports started
taking two days instead of 2 hours to run!!!!).
Interestingly when I re-write the queries using >= and <= to produce
identical results, the index works fine. Example queries in question include:
select count(*) from fact where dat='2002-03-01';
select count(*) from fact where dat<='2002-03-01' and dat>='2002-03-01';
The distribution of values in "dat" are roughly evenly spaced from
'2002-01-01' through '2002-03-25'.
Attached below are
A: Information about the table, including "\d" and "vacuum verbose analyze"
B: Output of "explain analyze" from the above queries (showing the 20X
slowdown)
C: Version and configuration information.
Any suggestions on what I should look at next would be appreciated.
Thanks much,
Ron
PS: As a quite perverse workaround, I rewrote all my queries to have
"col<=val and col>=val" everywhere I used to have "col=val"
and everything is running fine again... but that's just wierd.
============================================================================
== A: Information about the table
============================================================================
logs2=# \d fact
Table "fact"
Column | Type | Modifiers
--------+------------------------+-----------
dat | date |
tim | time without time zone |
ip_id | integer |
bid_id | integer |
req_id | integer |
ref_id | integer |
Indexes: i_fact__bid_id,
i_fact__dat,
i_fact__ref_id,
i_fact__req_id,
i_fact__tim
logs2=# select count(*) from fact;
count
----------
18410778
(1 row)
logs2=# vacuum verbose analyze fact;
NOTICE: --Relation fact--
NOTICE: Pages 144967: Changed 0, Empty 0; Tup 18410778: Vac 0, Keep 0, UnUsed
0.
Total CPU 11.56s/2.97u sec elapsed 71.91 sec.
NOTICE: Analyzing fact
VACUUM
============================================================================
== B: Explain Analyze for the two queries.
== Note that the <=, >= one was over 20X faster.
============================================================================
logs2=# explain analyze select count(*) from fact where dat='2002-03-01';
NOTICE: QUERY PLAN:
Aggregate (cost=375631.14..375631.14 rows=1 width=0) (actual
time=76689.42..76689.42 rows=1 loops=1)
-> Seq Scan on fact (cost=0.00..375101.72 rows=211765 width=0) (actual
time=20330.96..76391.94 rows=180\
295 loops=1)
Total runtime: 76707.92 msec
EXPLAIN
logs2=# explain analyze select count(*) from fact where dat<='2002-03-01' and
dat >='2002-03-01';
NOTICE: QUERY PLAN:
Aggregate (cost=5.98..5.98 rows=1 width=0) (actual time=2921.39..2921.39
rows=1 loops=1)
-> Index Scan using i_fact__dat on fact (cost=0.00..5.98 rows=1 width=0)
(actual time=73.55..2583.53 ro\
ws=180295 loops=1)
Total runtime: 2921.55 msec
EXPLAIN
logs2=#
============================================================================
== C: Version and configuration information.
============================================================================
[17]localhost:~/apps/pgsql% psql -V
psql (PostgreSQL) 7.2
contains support for: readline, history
Portions Copyright (c) 1996-2001, PostgreSQL Global Development Group
Portions Copyright (c) 1996, Regents of the University of California
Read the file COPYRIGHT or use the command \copyright to see the
usage and distribution terms.
[17]localhost:/scratch/pgsql/data% diff -wiu postgresql.conf postgresql.conf.bak
--- postgresql.conf Sat Mar 23 15:39:34 2002
+++ postgresql.conf.bak Tue Mar 5 19:33:54 2002
@@ -50,7 +50,7 @@
#shared_buffers = 10000 # 2*max_connections, min 16
## goes to about 84 meg with 4000.
#shared_buffers = 4000 # 2*max_connections, min 16
-shared_buffers = 10000 # 2*max_connections, min 16
+shared_buffers = 8000 # 2*max_connections, min 16
#max_fsm_relations = 100 # min 10, fsm is free space map
#max_fsm_pages = 10000 # min 1000, fsm is free space map
--
Ronald Mayer
Director of Web Business
InterVideo, Inc.
From | Date | Subject | |
---|---|---|---|
Next Message | Michael Beckstette | 2002-03-26 17:18:02 | postmaster crash |
Previous Message | Vaishakhi Ajmera | 2002-03-26 16:09:07 | Installing postgres |