From: | Tatsuo Ishii <t-ishii(at)sra(dot)co(dot)jp> |
---|---|
To: | Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> |
Cc: | hackers(at)postgreSQL(dot)org |
Subject: | Re: [HACKERS] Why is that so slow? |
Date: | 1999-03-06 14:08:09 |
Message-ID: | 199903061408.XAA00494@ext16.sra.co.jp |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
> Tatsuo Ishii <t-ishii(at)sra(dot)co(dot)jp> writes:
> > postal=> explain select * from postal,prefecture where city ~ '^aaa' and postal.pid = prefecture.pid;
> > NOTICE: QUERY PLAN:
> >
> > Nested Loop (cost=98.90 size=1 width=100)
> > -> Seq Scan on prefecture (cost=2.55 size=47 width=26)
> > -> Index Scan using pidindex on postal (cost=2.05 size=1 width=74)
> >
> > This is so slooow. Can anybody explain this? Am I missing something?
>
> and later:
> > I had defined a btree index on pid and it has 2000
> > duplicate entries in average! After I removed the index, the query
> > runs unbelievably fast! Now explain shows:
>
> > Nested Loop (cost=933.82 size=1 width=100)
> > -> Index Scan using cityindex on postal (cost=931.77 size=1 width=74)
> > -> Index Scan using prefpidindex on prefecture (cost=2.05 size=47 width=26)
>
> Hmm. Removal of the index is just a hack --- the system should have
> been smart enough not to use it. It looks like the system chose the
> first plan shown above because it thought that selecting postal entries
> matching a particular pid value would on average match only one postal
> tuple (note the "size" fields, which are estimates of the numbers of
> resulting tuples). But in reality, each scan produced 2000 matching
> entries on average, according to your second message --- and each of
> those entries had to be tested to see if it had the right city name.
> So, very slow.
>
> The question I have is why didn't the system realize that there would be
> lots of matches on pid? The "dispersion" statistics it keeps ought to
> have given it a clue that this approach wouldn't be very selective.
>
> The second example is fast because the scan over postal looking for city
> name matches finds only one match, so prefecture is scanned only once.
Actulally not only one since I use ~ operator. Anyway matching rows
would be reasonably small.
> However the cost estimates there also look bogus --- the system is again
> mis-guessing how many entries will be selected. It seems to think that
> all 47 prefecture entries will be matched by a scan for a specific pid.
> So, bogus dispersion values again (or bad use of them).
>
> Something is fishy here. Have you done a "vacuum analyze" since loading
> the data in these tables?
Oh, I never thought about that. After re-made the index I removed in
the next letter and did vacuum analyze, I got:
Hash Join (cost=951.50 size=19 width=100)
-> Index Scan using cityindex on postal (cost=944.77 size=19 width=74)
-> Hash (cost=0.00 size=0 width=0)
-> Seq Scan on prefecture (cost=2.55 size=47 width=26)
This plan looks good(and actually as fast as the previous
one). However, the cost estimate for prefecture is again 47?
--
Tatsuo Ishii
From | Date | Subject | |
---|---|---|---|
Next Message | Tom Lane | 1999-03-06 16:42:05 | Re: [HACKERS] Why is that so slow? |
Previous Message | Vadim Mikheev | 1999-03-06 10:06:48 | Re: [HACKERS] Bug on complex join |