From: | Daniel Kalchev <daniel(at)digsys(dot)bg> |
---|---|
To: | Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> |
Cc: | pgsql-hackers(at)postgresql(dot)org |
Subject: | Re: again on index usage |
Date: | 2002-01-10 09:39:31 |
Message-ID: | 200201100939.LAA28676@dcave.digsys.bg |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
>>>Tom Lane said:
> Daniel Kalchev <daniel(at)digsys(dot)bg> writes:
> > In any case, if we need to hit 50k pages (assuming the indexed data is
> > randomly scattered in the file), and having to read these three times each
, it
> > will be less I/O than having to read 1.7 million records.
>
> How do you arrive at that? Assuming 100 records per page (probably the
> right order of magnitude), the seqscan alternative is 17k page reads.
> Yes, you examine more tuples, but CPUs are lots faster than disks.
I tried this:
db=# select * into iplog_test from iplog_gate200112;
SELECT
db=# create index iplog_test_ipaddr_idx on iplog_test(ipaddr);
CREATE
db=# cluster iplog_test_ipaddr_idx on iplog_test;
CLUSTER
db=# create index iplog_test_ipdate_idx on iplog_test(ipdate);
CREATE
db=# vacuum verbose analyze iplog_test;
NOTICE: --Relation iplog_test--
NOTICE: Pages 17722: Changed 0, reaped 0, Empty 0, New 0; Tup 1706202: Vac 0,
Keep/VTL 0/0, Crash 0, UnUsed 0, MinLen 80, MaxLen 88; Re-using: Free/Avail.
Space 0/0; EndEmpty/Avail. Pages 0/0. CPU 1.48s/-1.86u sec.
NOTICE: Index iplog_test_ipaddr_idx: Pages 5621; Tuples 1706202. CPU
0.51s/1.80u sec.
NOTICE: Index iplog_test_ipdate_idx: Pages 4681; Tuples 1706202. CPU
0.36s/1.92u sec.
NOTICE: --Relation pg_toast_253297758--
NOTICE: Pages 0: Changed 0, reaped 0, Empty 0, New 0; Tup 0: Vac 0, Keep/VTL
0/0, Crash 0, UnUsed 0, MinLen 0, MaxLen 0; Re-using: Free/Avail. Space 0/0;
EndEmpty/Avail. Pages 0/0. CPU 0.00s/0.00u sec.
NOTICE: Index pg_toast_253297758_idx: Pages 1; Tuples 0. CPU 0.00s/0.00u sec.
NOTICE: Analyzing...
VACUUM
db=# explain
db-# SELECT sum(input), sum(output) FROM iplog_test
db-# WHERE
db-# '2001-12-01 00:00:00+02' <= ipdate AND ipdate < '2001-12-02 00:00:00+02'
AND
db-# '2001-12-01 00:00:00+02' <= ipdate AND ipdate < '2002-01-01 00:00:00+02'
AND
db-# ipaddr <<= '193.68.240.0/20' AND 'uni-gw' ~ router;
NOTICE: QUERY PLAN:
Aggregate (cost=56112.97..56112.97 rows=1 width=16)
-> Seq Scan on iplog_test (cost=0.00..56111.54 rows=284 width=16)
EXPLAIN
query runs for ca 3.5 minutes.
db=# set enable_seqscan to off;
the query plan is
Aggregate (cost=100507.36..100507.36 rows=1 width=16)
-> Index Scan using iplog_test_ipdate_idx on iplog_test
(cost=0.00..100505.94 rows=284 width=16)
query runs for ca 2.2 minutes.
Moves closer to your point :-)
Anyway, the platform is an dual Pentium III 550 MHz (intel) machine with 512
MB RAM, with 15000 RPM Cheetah for the database, running BSD/OS 4.2. The
machine is reasonably loaded all the time, so this is very much real-time test.
I agree, that with the 'wrong' clustering the index scan is not so much faster
than the sequential scan.
Perhaps I need to tune this machine's costs to prefer more disk intensive
operations over CPU intensive operations?
Let's see what 4.2 will result in.
Daniel
From | Date | Subject | |
---|---|---|---|
Next Message | Michael Meskes | 2002-01-10 10:37:23 | Re: ECPG: include sqlca |
Previous Message | Justin Clift | 2002-01-10 09:16:25 | Re: Increasing checkpoint distance helps 7.2 noticeably |