Re: Peculiar performance observation....

From: "Net Virtual Mailing Lists" <mailinglists(at)net-virtual(dot)com>
To: "Scott Marlowe" <smarlowe(at)g2switchworks(dot)com>
Cc: "Pgsql General" <pgsql-general(at)postgresql(dot)org>
Subject: Re: Peculiar performance observation....
Date: 2005-03-15 03:14:31
Message-ID: 20050315031431.1274@mail.net-virtual.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-general

>On Mon, 2005-03-14 at 06:11, Net Virtual Mailing Lists wrote:
>> Hello,
>>
>>
>> I am sorry to bring this up again.... Does anyone have any idea what
>> might be going on here?... I'm very worried about this situation.. ;-(
>
>It looks to me like either you're not analyzing often enough, or your
>statistics target is too low to get a good sample. Note your estimated
>versus real rows are off by a factor of 70 (28 est. versus 1943 actual
>rows). That's a pretty big difference, and where you should be looking.
>
>> > -> Seq Scan on jobdata (cost=0.00..7567.88 rows=28 width=52) (actual
>> >time=11.498..4800.907 rows=1943 loops=1)
>
>Yes, this is because PostgreSQL is using an index to approximate a
>sequential scan, which is not a good thing since PostgreSQL can't get
>all the information it needs from just an index, but has to visit the
>table to check visibility.
>

All of these were after a vacuum full analyze, which I actually do
nightly on the database.

I probably confused the issue with all of my posts, this is the query
which has me concerned. When running it on my system here, the disk
thrashes (and I mean *THRASHES*) the entire 12-20 seconds it takes to
run... WHen running on our production servers, I can't hear the disk,
but see an equally troubling performance loss when using the index.

database=> explain analyze select id from table1 where category <@ 'a.b';
QUERY
PLAN
-------------------------------------
-------------------------------------
-------------------------------------------------------------------------
Index Scan using table1_category_full_gist_idx on jobdata
(cost=0.00..113.48 rows=28 width=4) (actual time=43.814..12201.528
rows=1943 loops=1)
Index Cond: (category <@ 'a.b'::ltree)
Filter: (category <@ 'a.b'::ltree)
Total runtime: 12222.258 ms

I can do this to speed things up (this results in very little disk
activity, certainly not the thrashing the original query did):

create table yuck (id integer, category ltree[]);
insert into yuck select id, category from table1;
create index category_idx on yuck using gist(category);
vacuum analyze yuck;
jobs=> explain analyze select id from table1 where id in (select id from
yuck where category <@ 'a.b');
QUERY PLAN

-------------------------------------
-------------------------------------
-------------------------------------------------------------
Nested Loop (cost=108.64..114.28 rows=1 width=52) (actual
time=654.645..1245.212 rows=1943 loops=1)
-> HashAggregate (cost=108.64..108.64 rows=1 width=4) (actual
time=654.202..690.709 rows=1943 loops=1)
-> Index Scan using category_idx on yuck (cost=0.00..108.57
rows=28 width=4) (actual time=2.046..623.436 rows=1943 loops=1)
Index Cond: (category <@ 'a.b'::ltree)
Filter: (category <@ 'a.b'::ltree)
-> Index Scan using table1_pkey on jobdata (cost=0.00..5.64 rows=1
width=52) (actual time=0.219..0.235 rows=1 loops=1943)
Index Cond: (table1.id = "outer".id)
Total runtime: 1261.551 ms
(8 rows)

If I drop the index "table1_category_full_gist_idx", the query speeds up
dramatically (10-15 times faster on both dev and prod uction systems).

So my concern, in short: why is it so much slower when actually using an
index and why is it trying to make mince meat out of my hard drive?

- Greg

In response to

Responses

Browse pgsql-general by date

  From Date Subject
Next Message Yu Jie 2005-03-15 03:42:00 Question about database restrict
Previous Message Jack Orenstein 2005-03-15 02:37:09 Re: Logging VACUUM activity