From: | Vitaly Belman <vitalib(at)012(dot)net(dot)il> |
---|---|
To: | Marty Scholes <marty(at)outputservices(dot)com>, Robert Treat <xzilla(at)users(dot)sourceforge(dot)net>, Nick Barr <nicky(at)chuckie(dot)co(dot)uk> |
Cc: | Jochem van Dieten <jochemd(at)oli(dot)tudelft(dot)nl>, pgsql-performance(at)postgresql(dot)org |
Subject: | Re: PostgreSQL caching |
Date: | 2004-05-26 14:33:56 |
Message-ID: | 1673440625.20040526173356@012.net.il |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-performance |
Hello Marty, Nick and Robert,
NB> Depending on what version of PG you are running, IN might take a while
NB> to complete. If so try an EXISTS instead
RT> A question and two experiments... what version of postgresql is this?
I am using the newer 7.5dev native Windows port. For this reason I
don't think that IN will cause any trouble (I read that this issue was
resolved in 7.4).
MS> At any rate, a query with an IN clause should help quite a bit
MS> SELECT bv_books. *
MS> FROM bv_books
MS> WHERE bv_books.book_id IN (
MS> SELECT book_id
MS> FROM bv_genres
MS> WHERE bv_bookgenres.genre_id = 5830
MS> )
MS> ORDER BY vote_avg DESC LIMIT 10 OFFSET 0;
It looks like it helps a bit (though you meant "FROM bv_bookgenres",
right?). I can't tell you how MUCH it helped though, because of two
reasons:
1) As soon as I run a query, it is cached in the memory and I can't
really find a good way to flush it out of there to test again except a
full computer reset (shutting postmaster down doesn't help). If you
have a better idea on this, do tell me =\ (Reminding again, I am on
Windows).
2) I *think* I resolved this issue, at least for most of the genre_ids
(didn't go through them all, but tried a few with different book count
and the results looked quite good). The fault was partly mine, a few
weeks ago I increase the statistics for the genre_id column a bit too
much (from 10 to 70), I was unsure how exactly it works (and still am)
but it helped for a few genre_ids that had a high book count, yet it
also hurt the performence for the genres without as much ids. I now
halved the stastics (to 58) and almost everything looks good now.
Because of that I'll stop working on that query for a while (unless
you have some more performance tips on the subject). Big thanks to
everyone who helped.. And I might bring this issue later again, it it
still will cause too much troubles.
RT> Try reindexing i_bookgenres_genre_id and capture the explain
RT> analyze for that.
Is that's what you meant "REINDEX INDEX i_bookgenres_genre_id"? But it
returns no messages what-so-ever =\. I can EXPLAIN it either.
RT> If it doesn't help try doing set enable_indexscan = false and
RT> capture the explain analyze for that.
Here it is:
------------------------------------------------------------------------------------------------------------------------------------------
QUERY PLAN
Limit (cost=41099.93..41099.96 rows=10 width=76) (actual time=6734.000..6734.000 rows=10 loops=1)
-> Sort (cost=41099.93..41100.45 rows=208 width=76) (actual time=6734.000..6734.000 rows=10 loops=1)
Sort Key: bv_books.vote_count
-> Merge Join (cost=40229.21..41091.92 rows=208 width=76) (actual time=6078.000..6593.000 rows=1993 loops=1)
Merge Cond: ("outer".book_id = "inner".book_id)
-> Sort (cost=16817.97..16818.49 rows=208 width=4) (actual time=1062.000..1062.000 rows=1993 loops=1)
Sort Key: bv_bookgenres.book_id
-> Seq Scan on bv_bookgenres (cost=0.00..16809.96 rows=208 width=4) (actual time=0.000..1047.000 rows=1993 loops=1)
Filter: (genre_id = 5830)
-> Sort (cost=23411.24..23841.04 rows=171918 width=76) (actual time=5016.000..5189.000 rows=171801 loops=1)
Sort Key: bv_books.book_id
-> Seq Scan on bv_books (cost=0.00..4048.18 rows=171918 width=76) (actual time=0.000..359.000 rows=171918 loops=1)
Total runtime: 6734.000 ms
------------------------------------------------------------------------------------------------------------------------------------------
Regards,
Vitaly Belman
ICQ: 1912453
AIM: VitalyB1984
MSN: tmdagent(at)hotmail(dot)com
Yahoo!: VitalyBe
Wednesday, May 26, 2004, 1:24:18 AM, you wrote:
MS> Vitaly,
MS> This looks like there might be some room for performance improvement...
>> MS> I didn't see the table structure, but I assume
>> MS> that the vote_avg and
>> MS> vote_count fields are in bv_bookgenres.
>>
>> I didn't understand you. vote_avg is stored in bv_books.
MS> Ok. That helps. The confusion (on my end) came from the SELECT clause
MS> of the query you provided:
>> SELECT bv_books. * ,
>> vote_avg,
>> vote_count
MS> All fields from bv_books were selected (bv_books.*) along with vote_agv
MS> and vote_count. My assumption was that vote_avg and vote_count were
MS> therefore not in bv_books.
MS> At any rate, a query with an IN clause should help quite a bit:
MS> SELECT bv_books. *
MS> FROM bv_books
MS> WHERE bv_books.book_id IN (
MS> SELECT book_id
MS> FROM bv_genres
MS> WHERE bv_bookgenres.genre_id = 5830
MS> )
MS> ORDER BY vote_avg DESC LIMIT 10 OFFSET 0;
MS> Give it a whirl.
MS> Marty
MS> ---------------------------(end of
MS> broadcast)---------------------------
MS> TIP 6: Have you searched our list archives?
From | Date | Subject | |
---|---|---|---|
Next Message | Mario Soto | 2004-05-26 15:26:30 | performance very slow |
Previous Message | Robert Treat | 2004-05-26 13:29:06 | Re: where to find out when a table was last analyzed? |