Re: again on index usage

From: Daniel Kalchev <daniel(at)digsys(dot)bg>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: pgsql-hackers(at)postgresql(dot)org
Subject: Re: again on index usage
Date: 2002-01-11 05:32:25
Message-ID: 200201110532.HAA15650@dcave.digsys.bg
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

>>>Tom Lane said:
> It would be interesting to check whether there is any correlation
> between ipaddr and ipdate in your test data. Perhaps clustering
> on ipaddr might not destroy the ordering on ipdate as much as you
> thought. A more clearly random-order test would go:
>
> select * into iplog_test from iplog_gate200112 order by random();
> create index iplog_test_ipdate_idx on iplog_test(ipdate);
> vacuum verbose analyze iplog_test;
> << run queries >>

NOTICE: --Relation iplog_test--
NOTICE: Pages 17761: Changed 17761, reaped 0, Empty 0, New 0; Tup 1706202:
Vac 0, Keep/VTL 0/0, Crash 0, UnUsed 0, MinLen 80, MaxLen 88; Re-using:
Free/Avail. Space 0/0; EndEmpty/Avail. Pages 0/0. CPU 2.36s/0.32u sec.
NOTICE: Index iplog_test_ipdate_idx: Pages 4681; Tuples 1706202. CPU
0.26s/1.98u sec.
NOTICE: --Relation pg_toast_275335644--
NOTICE: Pages 0: Changed 0, reaped 0, Empty 0, New 0; Tup 0: Vac 0, Keep/VTL
0/0, Crash 0, UnUsed 0, MinLen 0, MaxLen 0; Re-using: Free/Avail. Space 0/0;
EndEmpty/Avail. Pages 0/0. CPU 0.00s/0.00u sec.
NOTICE: Index pg_toast_275335644_idx: Pages 1; Tuples 0. CPU 0.00s/0.00u sec.
NOTICE: Analyzing...

explain (with enable_seqscan to on)

Aggregate (cost=56151.97..56151.97 rows=1 width=16)
-> Seq Scan on iplog_test (cost=0.00..56150.54 rows=284 width=16)

average run time 4 minutes

(iostat before seq run)

tin tout sps tps msps sps tps msps usr nic sys int idl
5 44 94 6 5.9 6842 120 3.3 25 11 6 0 58
1 14 27 1 5.9 5968 138 3.6 22 11 5 0 62
5 44 58 2 6.3 5647 117 3.0 27 9 6 0 58
1 13 7 1 5.0 5723 125 3.4 24 10 5 0 61
0 15 50 2 4.5 5606 110 3.1 27 10 5 0 58
5 44 90 5 9.1 4702 87 2.5 28 12 4 0 55
1 15 52 1 6.3 5045 114 4.1 24 10 4 0 61

(iostat during seq run)

tin tout sps tps msps sps tps msps usr nic sys int idl
1 40 41 2 2.1 5847 123 3.6 25 11 4 0 60
1 16 164 13 6.2 7280 128 3.2 28 8 8 0 57
0 13 36 1 7.9 6059 147 3.9 25 8 5 0 62
0 13 48 3 5.3 6691 126 3.4 26 8 7 0 59
0 13 28 3 4.6 6473 103 2.3 28 11 7 0 54
0 13 138 11 7.6 10848 151 4.9 19 6 6 0 69
0 13 33 3 3.3 5568 100 3.6 21 16 3 0 59
0 13 24 2 1.1 6752 144 3.4 22 12 2 0 64

(sometime at the end of query run)

tin tout sps tps msps sps tps msps usr nic sys int idl
0 38 20 2 5.5 5621 57 1.2 23 23 3 0 51
0 13 89 7 7.7 8854 101 3.4 21 18 4 0 57
0 13 72 6 7.3 9929 88 2.2 18 21 4 0 57
0 13 129 6 9.6 4865 43 1.0 21 24 4 0 51
0 13 72 3 4.2 5421 46 0.5 24 22 4 0 50
0 13 52 2 3.5 5877 64 1.8 22 21 4 0 53
0 13 50 3 6.1 5561 54 1.7 19 26 3 0 52
0 13 42 1 3.8 5455 76 1.4 22 22 3 0 53

(see lower msps - perhaps other queries slowed down? - but then again returned
to 'normal')

0 13 244 20 6.6 6629 199 4.1 19 9 5 0 67
0 13 68 4 6.2 6080 191 4.3 14 14 3 0 70
0 13 75 3 5.9 6542 214 4.1 19 8 4 0 70
0 13 615 18 5.0 5454 129 4.1 20 18 3 0 59
0 13 88 2 5.7 3718 48 2.5 21 21 4 0 54
0 13 46 3 3.1 4533 75 2.9 20 19 5 0 56
0 13 143 7 5.1 4349 58 2.7 22 18 4 0 55
0 13 58 2 9.9 4038 45 2.0 20 24 4 0 52
0 13 111 4 5.8 4523 60 3.4 18 22 4 0 56

(with enable_seqscan to off)

Aggregate (cost=85110.08..85110.08 rows=1 width=16)
-> Index Scan using iplog_test_ipdate_idx on iplog_test
(cost=0.00..85108.66 rows=284 width=16)

average run time 2 minutes (worst 5 minutes, but this happened only once)

(iostat output)

tin tout sps tps msps sps tps msps usr nic sys int idl
0 38 1 1 6.7 5110 224 3.2 7 16 3 0 73
0 13 21 2 3.6 5249 219 4.1 12 13 2 0 73
0 13 0 0 10.0 5341 216 3.9 12 11 4 0 73
0 13 6 0 9.9 5049 218 3.5 10 14 3 0 72
0 13 6 0 0.0 7654 216 3.8 10 10 2 0 78
0 13 2 1 4.0 8758 222 4.1 6 11 4 0 80
0 13 6 0 9.9 8594 219 4.4 6 10 3 0 81
0 13 4 1 0.0 7290 210 4.3 6 10 4 0 80
0 13 36 3 4.9 5912 196 4.7 9 10 4 0 76
0 13 7 1 13.3 4787 209 3.7 9 17 2 0 72
0 13 0 0 10.0 4691 209 3.8 9 17 2 0 72

This sort of proves that Tom is right about the need for random data. <grin>

Having the change to play with it at such idle time in the morning, when
nobody is supposed to be working with the databae I dared to shut down all
other sessions

(idle iostat)

iostat 5
tty sd0 sd1 % cpu
tin tout sps tps msps sps tps msps usr nic sys int idl
0 559 148 2 7.7 8 1 3.3 0 0 0 0 99
0 559 235 4 4.1 30 2 2.0 0 0 1 0 98
0 559 189 2 8.7 3 0 10.0 0 0 1 0 99

seq scan (10 sec)

0 575 12103 101 9.2 18798 164 1.4 19 0 8 0 73
0 575 10118 82 11.2 30507 243 0.9 33 0 11 0 55
0 559 12704 101 9.3 7642 61 0.7 9 0 8 0 83

index scan 45 sec

0 38 8 1 4.0 6179 375 2.5 1 0 2 0 98
0 13 4 1 2.5 6208 378 2.4 1 0 2 0 97
0 13 7 1 3.3 6323 382 2.4 1 0 2 0 97
0 13 9 2 0.0 6310 389 2.4 1 0 2 0 97
0 13 0 0 0.0 3648 226 2.3 0 0 1 0 98

This proves to me 15000 RPM Cheetah drives are damn fast at sequential reads.
Why index scan wins at 'normal' database load??? I would expect that if the
disk subsystem is overloaded, it will be more overloaded seek-wise, than
troughput-wise. Perhaps this penalizes the 'expected to be faster' sequential
reads much more than the random page index reads.

I will try 4.2 with the same data as soon as available and see the differences
- on a separate machine with about the same configuration (same OS,same
postgres install, same disks, only one 733 MHz CPU instead of two 550 MHz
CPUs).

Daniel

Browse pgsql-hackers by date

  From Date Subject
Next Message Daniel Kalchev 2002-01-11 05:40:41 Re: pg_upgrade
Previous Message Bruce Momjian 2002-01-11 05:21:30 Re: pg_upgrade