From: | "Net Virtual Mailing Lists" <mailinglists(at)net-virtual(dot)com> |
---|---|
To: | "Scott Marlowe" <smarlowe(at)g2switchworks(dot)com> |
Cc: | "Pgsql General" <pgsql-general(at)postgresql(dot)org> |
Subject: | Re: Peculiar performance observation.... |
Date: | 2005-03-15 03:14:31 |
Message-ID: | 20050315031431.1274@mail.net-virtual.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-general |
>On Mon, 2005-03-14 at 06:11, Net Virtual Mailing Lists wrote:
>> Hello,
>>
>>
>> I am sorry to bring this up again.... Does anyone have any idea what
>> might be going on here?... I'm very worried about this situation.. ;-(
>
>It looks to me like either you're not analyzing often enough, or your
>statistics target is too low to get a good sample. Note your estimated
>versus real rows are off by a factor of 70 (28 est. versus 1943 actual
>rows). That's a pretty big difference, and where you should be looking.
>
>> > -> Seq Scan on jobdata (cost=0.00..7567.88 rows=28 width=52) (actual
>> >time=11.498..4800.907 rows=1943 loops=1)
>
>Yes, this is because PostgreSQL is using an index to approximate a
>sequential scan, which is not a good thing since PostgreSQL can't get
>all the information it needs from just an index, but has to visit the
>table to check visibility.
>
All of these were after a vacuum full analyze, which I actually do
nightly on the database.
I probably confused the issue with all of my posts, this is the query
which has me concerned. When running it on my system here, the disk
thrashes (and I mean *THRASHES*) the entire 12-20 seconds it takes to
run... WHen running on our production servers, I can't hear the disk,
but see an equally troubling performance loss when using the index.
database=> explain analyze select id from table1 where category <@ 'a.b';
QUERY
PLAN
-------------------------------------
-------------------------------------
-------------------------------------------------------------------------
Index Scan using table1_category_full_gist_idx on jobdata
(cost=0.00..113.48 rows=28 width=4) (actual time=43.814..12201.528
rows=1943 loops=1)
Index Cond: (category <@ 'a.b'::ltree)
Filter: (category <@ 'a.b'::ltree)
Total runtime: 12222.258 ms
I can do this to speed things up (this results in very little disk
activity, certainly not the thrashing the original query did):
create table yuck (id integer, category ltree[]);
insert into yuck select id, category from table1;
create index category_idx on yuck using gist(category);
vacuum analyze yuck;
jobs=> explain analyze select id from table1 where id in (select id from
yuck where category <@ 'a.b');
QUERY PLAN
-------------------------------------
-------------------------------------
-------------------------------------------------------------
Nested Loop (cost=108.64..114.28 rows=1 width=52) (actual
time=654.645..1245.212 rows=1943 loops=1)
-> HashAggregate (cost=108.64..108.64 rows=1 width=4) (actual
time=654.202..690.709 rows=1943 loops=1)
-> Index Scan using category_idx on yuck (cost=0.00..108.57
rows=28 width=4) (actual time=2.046..623.436 rows=1943 loops=1)
Index Cond: (category <@ 'a.b'::ltree)
Filter: (category <@ 'a.b'::ltree)
-> Index Scan using table1_pkey on jobdata (cost=0.00..5.64 rows=1
width=52) (actual time=0.219..0.235 rows=1 loops=1943)
Index Cond: (table1.id = "outer".id)
Total runtime: 1261.551 ms
(8 rows)
If I drop the index "table1_category_full_gist_idx", the query speeds up
dramatically (10-15 times faster on both dev and prod uction systems).
So my concern, in short: why is it so much slower when actually using an
index and why is it trying to make mince meat out of my hard drive?
- Greg
From | Date | Subject | |
---|---|---|---|
Next Message | Yu Jie | 2005-03-15 03:42:00 | Question about database restrict |
Previous Message | Jack Orenstein | 2005-03-15 02:37:09 | Re: Logging VACUUM activity |