Quick Links

Re: How can I make this query faster (resend)

From:	"Cstdenis" <cstdenis(at)voicio(dot)com>
To:	"Nis Jorgensen" <nis(at)superlativ(dot)dk>
Cc:	<pgsql-performance(at)postgresql(dot)org>
Subject:	Re: How can I make this query faster (resend)
Date:	2006-05-29 14:35:14
Message-ID:	013801c6832d$1cf73d90$6401a8c0@chris
Views:	Whole Thread \| Raw Message \| Download mbox \| Resend email
Thread:
Lists:	pgsql-performance

(Resending because my other send didn't get a CC to the list)

From: "Jim C. Nasby" <jnasby(at)pervasive(dot)com>
To: "Cstdenis" <cstdenis(at)voicio(dot)com>
Cc: <pgsql-performance(at)postgresql(dot)org>
Sent: Monday, May 22, 2006 8:20 AM
Subject: Re: [PERFORM] How can I make this query faster (resend)

> On Fri, May 19, 2006 at 03:56:49PM -0700, Cstdenis wrote:
> > (Its been a hour and I dont see my message on the list so I'm sending it
again. I've moved the queries and analyze out of the email incase it was
rejected because too long)
> >
> > query: http://pastebin.ca/57218
> >
> > In the pictures table all the ratings have a shared index
> >
> > CREATE INDEX idx_rating ON pictures USING btree (rating_nudity,
rating_violence, rating_sex, rating_racism, rating_spoilers, rating_yaoi,
rating_yuri, rating_profanity);
> >
> > and approved and date_submitted and user_id also have their own btree
indexes.
> >
> > In the picture_categories table pid and cat_id have their own btree
indices plus one together.
> >
> > Full table definition: http://pastebin.ca/57219
> >
> > the cat_id and rating values vary from query to query. The one listed
above took 54 seconds in a test run just now. Here is explain analyze:
http://pastebin.ca/57220
>
> pictures is the interesting table here. It looks like the planner would
> do better to choose something other than a nested loop on it. Try
> running EXPLAIN ANALYZE on the query with enable_nestloop=off and see
> what you get (you'll need to compare it to what you get with
> enable_nestloop on to see what the change is).

With enable_nestloop=off the same query as is explained further down in this
email took much longer 63 seconds insted of 6. It decided to do sequencial
scans on pictures and users with nested loop disabled.

Merge Join (cost=146329.63..146963.96 rows=231 width=66) (actual
time=61610.538..62749.176 rows=1305 loops=1)
Merge Cond: ("outer".user_id = "inner".user_id)
-> Sort (cost=123828.88..123829.46 rows=231 width=47) (actual
time=60445.367..60451.176 rows=1305 loops=1)
Sort Key: pictures.user_id
-> Hash Join (cost=634.36..123819.81 rows=231 width=47) (actual
time=128.088..60423.623 rows=1305 loops=1)
Hash Cond: ("outer".pid = "inner".pid)
-> Seq Scan on pictures (cost=0.00..121670.43 rows=302543
width=47) (actual time=0.210..58795.925 rows=291318 loops=1)
-> Hash (cost=633.78..633.78 rows=231 width=4) (actual
time=38.443..38.443 rows=1305 loops=1)
-> Bitmap Heap Scan on picture_categories
(cost=2.81..633.78 rows=231 width=4) (actual time=4.753..32.259 rows=1305
loops=1)
Recheck Cond: (cat_id = 182)
-> Bitmap Index Scan on
idx_picture_categories_cat_id (cost=0.00..2.81 rows=231 width=0) (actual
time=4.398..4.398 rows=1305 loops=1)
Index Cond: (cat_id = 182)
-> Sort (cost=22500.74..22816.79 rows=126418 width=23) (actual
time=1163.788..1505.104 rows=52214 loops=1)
Sort Key: users.user_id
-> Seq Scan on users (cost=0.00..11788.18 rows=126418 width=23)
(actual time=0.017..692.992 rows=54605 loops=1)
Total runtime: 62776.720 ms

> > Both pictures and picture categories have about 287,000 rows
> >
> > This query needs to run in under about a second or it kills my site by
clogging apache slots (apache maxes out at 256 and I can have several
hundred people on my site at a time). How can I make it run faster?
> >
> >
> > Server is a dual xeon with a gig of ram dedicated mostly to postgresql.
> > Here is the changed lines in my postgresql.conf:
http://pastebin.ca/57222
>
> I suspect the low work_mem may be why it's using a nested loop. In
> addition to the test above, it would be interesting to see what happens
> to the plan if you set work_mem to 10000.

I moved to a more powerful server (2gb ram and mirrored scsi HDs) and upped
the work mem to 10mb. Its much faster now, however its still doing a nested
loop. (see also my reply to Markus Schaber)

Nested Loop (cost=2.81..3398.76 rows=231 width=66) (actual
time=14.946..5797.701 rows=1305 loops=1)
-> Nested Loop (cost=2.81..2022.71 rows=231 width=47) (actual
time=14.551..5181.042 rows=1305 loops=1)
-> Bitmap Heap Scan on picture_categories (cost=2.81..633.78
rows=231 width=4) (actual time=9.966..140.606 rows=1305 loops=1)
Recheck Cond: (cat_id = 182)
-> Bitmap Index Scan on idx_picture_categories_cat_id
(cost=0.00..2.81 rows=231 width=0) (actual time=9.720..9.720 rows=1305
loops=1)
Index Cond: (cat_id = 182)
-> Index Scan using pictures_pkey on pictures (cost=0.00..6.00
rows=1 width=47) (actual time=3.802..3.820 rows=1 loops=1305)
Index Cond: (pictures.pid = "outer".pid)
-> Index Scan using users_pkey on users (cost=0.00..5.94 rows=1
width=23) (actual time=0.095..0.100 rows=1 loops=1305)
Index Cond: ("outer".user_id = users.user_id)
Total runtime: 5812.238 ms

> To be honest, you're pushing things expecting a machine with only 1G to
> serve 300 active connections. How large is the database itself?

The database is 3.7G on disk. There is about 1G of actual data in it -- the
rest is dead tuples and indices. (I vacuum regularly, but a vacuum full
causes too much downtime to do unless I have to)

> > I know hyperthreading is considered something that can slow down a
server but with my very high concurancy (averages about 400-500 concurant
users during peak hours) I am hoping the extra virtual CPUs wil help. Anyone
have experance that says diferent at high concurancy?
>
> Best bet is to try it and see. Generally, people find HT hurts, but I
> recently saw it double the performance of pgbench on a windows XP
> machine, so it's possible that windows is just more clever about how to
> use it than linux is.

Anyone know if those who have found it hurts are low concurancy complex cpu
intensive queries or high concurancy simple queries or both? I can
understand it hurting in the former, but not the later. I'll have to give it
a try I guess. It should at least help my very high load averages.

> --
> Jim C. Nasby, Sr. Engineering Consultant jnasby(at)pervasive(dot)com
> Pervasive Software http://pervasive.com work: 512-231-6117
> vcard: http://jim.nasby.net/pervasive.vcf cell: 512-569-9461
>
> ---------------------------(end of broadcast)---------------------------
> TIP 3: Have you checked our extensive FAQ?
>
> http://www.postgresql.org/docs/faq
>

In response to

How can I make this query faster (resend) at 2006-05-19 22:56:49 from Cstdenis

Responses

Re: How can I make this query faster (resend) at 2006-06-05 14:35:29 from Jim C. Nasby

Browse pgsql-performance by date

	From	Date	Subject
Next Message	Anton Maksimenkov	2006-05-29 15:39:51	Re: select with "like" from another table
Previous Message	Cstdenis	2006-05-29 14:33:18	Re: How can I make this query faster (resend)