increasing effective_cache_size slows down join queries by a factor of 4000x

From: Artyom Shaposhnikov <artyom(at)gmail(dot)com>
To: pgsql-general(at)lists(dot)postgresql(dot)org
Subject: increasing effective_cache_size slows down join queries by a factor of 4000x
Date: 2022-02-02 19:05:02
Message-ID: CA+1Wm9U_sP9237f7OH7O=-UTab71DWOO4Qc-vnC78DfsJQBCwQ@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-general

I recently moved a postgres DB to a more powerful server with 1TB of
RAM instead of 64GB before. To my surprise after running the tuning on
the new server, the performance of joins deteriorated by 4000x
compared to the old server. I carefully examined all of the changes
and found the culprit:

if I use the effective_cache_size > 25GB, the query plan used is this:

Limit (cost=124.12..590.33 rows=1000 width=205) (actual
time=42326.662..42336.212 rows=1000 loops=1)
-> Nested Loop (cost=124.12..31909018.46 rows=68443040 width=205)
(actual time=42326.660..42336.127 rows=1000 loops=1)
-> Merge Join (cost=124.12..30710356.97 rows=68443040 width=169)
(actual time=42326.613..42332.764 rows=1000 loops=1)
Merge Cond: (d.id = dc.data_id)
-> Nested Loop (cost=1.00..31036282.72 rows=58785023 width=165)
(actual time=0.042..5.533 rows=854 loops=1)
-> Index Scan using data_pkey on data t (cost=0.57..4787030.00
rows=58785023 width=131) (actual time=0.023..0.526 rows=854 loops=1)
Index Cond: (id > 205284974)
-> Index Scan using user_pkey on data_user u (cost=0.43..0.45 rows=1
width=42) (actual time=0.005..0.005 rows=1 loops=854)
Index Cond: (id = d.user_id)
-> Index Only Scan using data_class_pkey on data_class ta
(cost=0.57..4935483.78 rows=216964862 width=8) (actual
time=0.018..35022.908 rows=151321889 loops=1)
Heap Fetches: 151321889
-> Index Scan using class_id_index on class a (cost=0.00..0.02 rows=1
width=44) (actual time=0.003..0.003 rows=1 loops=1000)
Index Cond: (id = dc.class_id)
Planning Time: 4.114 ms
Execution Time: 42336.397 ms

and it is 4000x slower than the query plan used with the lower
effective_cache_size that uses indexes instead of the merge joins:

Limit (cost=1.57..4832.30 rows=1000 width=205) (actual
time=0.081..10.457 rows=1000 loops=1)
-> Nested Loop (cost=1.57..330629805.46 rows=68443040 width=205)
(actual time=0.080..10.378 rows=1000 loops=1)
-> Nested Loop (cost=1.57..267793481.39 rows=68443040 width=169)
(actual time=0.065..7.496 rows=1000 loops=1)
-> Nested Loop (cost=1.00..100917823.18 rows=58785023 width=165)
(actual time=0.040..5.424 rows=854 loops=1)
-> Index Scan using data_pkey on data t (cost=0.57..21427806.53
rows=58785023 width=131) (actual time=0.024..0.482 rows=854 loops=1)
Index Cond: (id > 205284974)
-> Index Scan using user_pkey on data_user u (cost=0.43..1.35 rows=1
width=42) (actual time=0.005..0.005 rows=1 loops=854)
Index Cond: (id = d.user_id)
-> Index Only Scan using data_class_pkey on data_class ta
(cost=0.57..2.80 rows=4 width=8) (actual time=0.002..0.002 rows=1
loops=854)
Index Cond: (data_id = d.id)
Heap Fetches: 1000
-> Index Scan using class_id_index on class a (cost=0.00..0.92 rows=1
width=44) (actual time=0.002..0.002 rows=1 loops=1000)
Index Cond: (id = dc.class_id)
Planning Time: 5.074 ms
Execution Time: 10.614 ms

query:

explain analyze select d.time as time,d.id as id, u.username as
username, a.query_symbol as query_symbol from data as d, data_user as
u, class as a, data_class as dc
where dc.class_id = a.id and dc.data_id = d.id and d.user_id = u.id
and d.id > 205284974
order by d.id
limit 1000;

I found a way to fix it by creating a distinct statistics on the
largest join table:

create statistics stat_data_class (ndistinct) on data_id, class_id
from data_class;
analyze data_class;

Question: are there other ways to give PostgreSQL a hint to use the
indexes instead of the merge join plan without artificially lowering
the memory available in the parameter effective_cache_size or creating
the additional statistics on the table?

Thank you,

-Art

Responses

Browse pgsql-general by date

  From Date Subject
Next Message Michael Lewis 2022-02-02 19:46:39 Re: increasing effective_cache_size slows down join queries by a factor of 4000x
Previous Message Laura Smith 2022-02-02 18:35:37 Re: pg_basebackup with hostssl ?