From: | Manfred Koizar <mkoi-pg(at)aon(dot)at> |
---|---|
To: | Thomas Beutin <tyrone(at)laokoon(dot)IN-Berlin(dot)DE> |
Cc: | pgsql-general(at)postgresql(dot)org |
Subject: | Re: INDEX suggestion needed |
Date: | 2002-12-12 21:00:48 |
Message-ID: | pgrhvus3d1nr8pp8311lopqd5uk4ffuir5@4ax.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-general |
On Thu, 12 Dec 2002 20:13:24 +0100, Thomas Beutin
<tyrone(at)laokoon(dot)IN-Berlin(dot)DE> wrote:
>tb=# VACUUM VERBOSE ANALYZE stat_pages;
>NOTICE: Pages 7809: Changed 0, Empty 0; Tup 343554: Vac 0, Keep 0, UnUsed 0.
>Aggregate (cost=3.19..3.19 rows=1 width=34) (actual time=52.89..52.89 rows=1 loops=1)
> -> Index Scan using tb5 on stat_pages (cost=0.00..3.18 rows=1 width=34) (actual time=52.74..52.74 rows=0 loops=1)
>Total runtime: 53.11 msec
>tb=# EXPLAIN ANALYZE SELECT COUNT(DISTINCT a_id) FROM stat_pages WHERE m_id = 35::smallint AND (visit >= '2002-06-01' AND visit <= '2002-12-11');
This selects (almost) all rows. An index cannot help.
>i got the following index/sequence scans by date ranges:
>(visit >= '2002-06-01' AND visit <= '2002-06-30') index scan
0 rows satisfy this condition, index scan is ok
>(visit >= '2002-06-01' AND visit <= '2002-07-31') index scan
Ca. 10000 rows, 3% of the whole table, index scan ok
>(visit >= '2002-06-01' AND visit <= '2002-08-31') sequence scan
32%, sequence scan is expected to be faster, unless tuples are almost
perfectly ordered by visit or most of the table (~ 8000 pages) fits
into the cache. How much physical memory is installed? What are your
shared_buffers and effective_cache_size settings? You might want to
experiment with
SET enable_seqscan = off;
>(visit >= '2002-07-01' AND visit <= '2002-07-31') index scan
Same as 06-01 to 07-31.
>(visit >= '2002-08-01' AND visit <= '2002-08-31') sequence scan
Ca. 29%
>(visit >= '2002-09-01' AND visit <= '2002-09-30') sequence scan
More than 50%
>(visit >= '2002-10-01' AND visit <= '2002-10-31') index scan, but long (>5sec)
Ca. 3%; interesting that it takes so long; I'd like to see EXPLAIN
ANALYZE output for enable_seqscan on and off.
>And: The date range in my table is from 2002-07-10 11:36:53+02 up to
>2002-10-29 23:31:47+01.
Yes, this is approximately reflected by the histogram bounds.
> attname | null_frac | avg_wi | n_distinct | correlation
>---------+-----------+--------+------------+-------------
> visit | 0 | 8 | -0.543682 | -0.972118
The negative correlation looks strange. How did you insert your data?
> m_id | 0 | 2 | 1 | 1
^^^
Only one distinct value in m_id? This explains why your m_id index is
never used.
> attname | most_common_vals
>---------+--------------------------
> visit | "2002-08-21 10:29:10+02", ...
> m_id | 35
> attname | histogram_bounds
>---------+--------------------------
> visit | "2002-07-25 16:37:12+02"
> "2002-08-15 12:36:18+02"
> "2002-08-23 12:36:15+02"
> "2002-08-29 17:30:54+02"
> "2002-09-05 12:54:31+02"
> "2002-09-10 18:03:54+02"
> "2002-09-16 15:44:56+02"
> "2002-09-20 14:34:40+02"
> "2002-09-24 13:59:29+02"
> "2002-09-29 09:09:31+02"
> "2002-10-29 23:25:13+01"
> m_id |
>
>??? Is this output ok?
Almost. I forgot to ask for most_common_freqs (cut'n'paste error).
But I don't expect any value of visit to occur much more than twice,
so these values should be irrelevant to our estimations.
Servus
Manfred
From | Date | Subject | |
---|---|---|---|
Next Message | Lincoln Yeoh | 2002-12-12 21:02:28 | Re: Docs: GIST |
Previous Message | Doug Fields | 2002-12-12 20:38:32 | Moving Indices to a different disk and various performance questions |