Re: Not sure which part of the query needs optimization

From: Andrew Gierth <andrew(at)tao11(dot)riddles(dot)org(dot)uk>
To: Alexander Farber <alexander(dot)farber(at)gmail(dot)com>
Cc: pgsql-general <pgsql-general(at)postgresql(dot)org>
Subject: Re: Not sure which part of the query needs optimization
Date: 2019-01-07 19:40:47
Message-ID: 87ef9oqnr2.fsf@news-spur.riddles.org.uk
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-general

>>>>> "Alexander" == Alexander Farber <alexander(dot)farber(at)gmail(dot)com> writes:

Alexander> Good afternoon,

Alexander> for each visitor of my website I generate a JSON list of 30
Alexander> top players ( https://slova.de/words/top.php ), who played
Alexander> in the past week, with their average scores and average time
Alexander> between moves.

-> Parallel Seq Scan on words_moves m_1
(cost=0.00..73600.05 rows=8666 width=16)
(actual time=0.761..130.844 rows=11074 loops=3)
Filter: (played > (CURRENT_TIMESTAMP - '7 days'::interval))
Rows Removed by Filter: 492241

This is telling you that an index on words_moves(played) would likely
help.

But the real hot point in the query is here (reformatted for clarity):

-> Aggregate (cost=6097.83..6097.84 rows=1 width=32)
(actual time=19.401..19.401 rows=1 loops=30)
-> Bitmap Heap Scan on words_moves
(cost=33.97..6093.45 rows=1748 width=4)
(actual time=1.680..18.153 rows=15011 loops=30)
Recheck Cond: (uid = u.uid)
Heap Blocks: exact=216312
-> Bitmap Index Scan on words_moves_uid_idx
(cost=0.00..33.54 rows=1748 width=0)
(actual time=0.979..0.979 rows=15011 loops=30)
Index Cond: (uid = u.uid)

(the explain.depesz.com view points this out with an orange highlight)

This corresponds to this subquery:

(SELECT ROUND(AVG(score), 1)
FROM words_moves
WHERE uid = u.uid) AS avg_score,

The basic issue here is that you're calculating an average over around
15k rows per user, for each user in the query result (so 30 times,
making 450k rows). You can see from the "Heap Blocks" stat that this is
having to scan a lot of data; it's taking on average 19.4ms per user,
but multiply that by 30 users and you get ~580ms total, or about 70% of
the total execution time.

The obvious thing to do is to keep a computed average score for each
user - either in a separate table which you update based on changes to
words_moves, which you could do with a trigger, or using a materialized
view which you refresh at suitable intervals (this has the drawback that
the data will not be immediately up-to-date).

Combining these two changes should get you to under 100ms, maybe.

--
Andrew (irc:RhodiumToad)

In response to

Responses

Browse pgsql-general by date

  From Date Subject
Next Message Thomas Carroll 2019-01-07 20:54:41 Re: Memory exhaustion due to temporary tables?
Previous Message Mitar 2019-01-07 17:45:35 Re: Optimizing the same PREPAREd static query (without parameters)