From: | Glenn Maynard <glennfmaynard(at)gmail(dot)com> |
---|---|
To: | pgsql-performance(at)postgresql(dot)org |
Subject: | Nested query performance issue |
Date: | 2009-04-08 21:09:24 |
Message-ID: | d18085b50904081409xcd4dc21rbc5ea6f71a61dd1a@mail.gmail.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-performance |
(This is related to an earlier post on -sql.)
I'm querying for the N high scores for each game, with two tables:
scores and games.
CREATE TABLE game (id SERIAL NOT NULL PRIMARY KEY);
CREATE TABLE score (id SERIAL NOT NULL PRIMARY KEY, score REAL,
game_id INTEGER REFERENCES game (id));
-- test data: 1000 games, 100000 scores
INSERT INTO game (id) select generate_series(1,1000);
INSERT INTO score (game_id, score) select game.id, random() from game,
generate_series(1,100);
CREATE INDEX score_idx1 ON score (game_id, score desc);
ANALYZE;
This query retrieves the single highest score for each game, but
doesn't allow any more than that--I can't get the top five scores for
each game. However, it's very fast: on the above test data, it runs
in 25ms on my system. With 1000000 scores, it takes 40ms.
SELECT s.* FROM score s
WHERE s.id IN (
-- Get the high scoring score ID for each game:
SELECT
(
-- Get the high score for game g:
SELECT s2.id FROM score s2 WHERE s2.game_id = g.id ORDER BY
s2.score DESC LIMIT 1
)
FROM game g
);
This rewrite allows getting the top N scores. Unfortunately, this one
takes 950ms for the same data. With 1000000 scores, it takes 14800ms.
SELECT s.* FROM score s, game g
WHERE s.game_id = g.id AND
s.id IN (
SELECT s2.id FROM score s2 WHERE s2.game_id=g.id ORDER BY s2.score
DESC LIMIT 1
);
This seems simple: for each game, search for the highest score, and
then scan the tree to get the next N-1 highest scores. The first
version does just that, but the second one is doing a seq scan over
score.
I do want to be able to use a LIMIT higher than 1, which only works
with the second form. Any suggestions of how to get the efficient
scanning of the first while being able to use a LIMIT greater than 1?
(It'd even be faster to make several calls to the first version,
varying an OFFSET to get each high score--but that's terrible.)
--
Glenn Maynard
From | Date | Subject | |
---|---|---|---|
Next Message | Віталій Тимчишин | 2009-04-08 21:30:48 | Re: Nested query performance issue |
Previous Message | Dimitri Fontaine | 2009-04-08 20:46:47 | Re: Best replication solution? |