From: | Ed Sabol <edwardjsabol(at)gmail(dot)com> |
---|---|
To: | pgsql-performance(at)lists(dot)postgresql(dot)org |
Subject: | Re: Major performance degradation with joins in 15.8 or 15.7? |
Date: | 2024-11-08 20:32:54 |
Message-ID: | 4190FC09-590B-443F-9064-88B6F7C1EBE2@gmail.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-performance |
On Nov 7, 2024, at 9:54 PM, Andrei Lepikhov <lepihov(at)gmail(dot)com> wrote:
> On 11/8/24 09:45, Ed Sabol wrote:
>> On Nov 7, 2024, at 9:27 PM, Andrei Lepikhov <lepihov(at)gmail(dot)com> wrote:
>>> Postgres didn't want Materialize in this example because of the low estimation on its outer subquery. AFAIC, by increasing the *_page_cost's value, you added extra weight to the inner subquery
>> What kind of extended statistics do you suggest for this? ndistinct, dependencies, or mcv?
>> CREATE STATISTICS tablename_stats (<statistics type?>) ON relation, type FROM tablename;
>> ANALYZE tablename;
> I'd recommend to use all of them - MCV is helpful in most of the cases (and relatively cheap), distinct is actually used in Postgres now to calculate number of groups (GROUP-BY, Sort, Memoize, etc.); dependencies - to find correlations between columns - usually in scan filters.
OK, I've executed the following:
CREATE STATISTICS tablename_stats_rt_nd (ndistinct) ON relation, type FROM tablename;
CREATE STATISTICS tablename_stats_rt_mcv (mcv) ON relation, type FROM tablename;
CREATE STATISTICS tablename_stats_rt_dep (dependencies) ON relation, type FROM tablename;
CREATE STATISTICS tablename_stats_rv_nd (ndistinct) ON relation, value FROM tablename;
CREATE STATISTICS tablename_stats_rv_mcv (mcv) ON relation, value FROM tablename;
CREATE STATISTICS tablename_stats_rv_dep (dependencies) ON relation, value FROM tablename;
CREATE STATISTICS tablename_stats_nr_nd (ndistinct) ON name, relation FROM tablename;
CREATE STATISTICS tablename_stats_nr_mcv (mcv) ON name, relation FROM tablename;
CREATE STATISTICS tablename_stats_nr_dep (dependencies) ON name, relation FROM tablename;
CREATE STATISTICS tablename_stats_nt_nd (ndistinct) ON name, type FROM tablename;
CREATE STATISTICS tablename_stats_nt_mcv (mcv) ON name, type FROM tablename;
CREATE STATISTICS tablename_stats_nt_dep (dependencies) ON name, type FROM tablename;
CREATE STATISTICS tablename_stats_nv_nd (ndistinct) ON name, value FROM tablename;
CREATE STATISTICS tablename_stats_nv_mcv (mcv) ON name, value FROM tablename;
CREATE STATISTICS tablename_stats_nv_dep (dependencies) ON name, value FROM tablename;
ANALYZE tablename;
Now with random_page_cost = 4.0, the optimizer materializes, and it's fast again:
Nested Loop (cost=1226.12..11129.87 rows=1 width=112) (actual time=30.965..31.333 rows=1 loops=1)
Join Filter: (a.name = d.name)
Buffers: shared hit=7447
-> Nested Loop (cost=1225.70..11112.51 rows=1 width=108) (actual time=30.921..31.208 rows=1 loops=1)
Buffers: shared hit=7418
-> Hash Join (cost=1225.27..11093.62 rows=1 width=86) (actual time=30.862..31.078 rows=1 loops=1)
Hash Cond: ((a.name || '.doc'::text) = b_1.name)
Buffers: shared hit=7389
-> Nested Loop (cost=1167.53..11019.89 rows=11 width=70) (actual time=27.143..27.347 rows=1 loops=1)
Join Filter: (CASE WHEN ("position"(a.name, 'zz'::text) = 1) THEN a.name ELSE ('h_'::text || a.name) END = "*SELECT* 1".table_name)
Rows Removed by Join Filter: 1021
Buffers: shared hit=6268
-> Bitmap Heap Scan on tablename a (cost=456.55..5407.28 rows=1077 width=38) (actual time=2.986..15.865 rows=1022 loops=1)
Recheck Cond: (relation = 'description'::text)
Filter: (type = 'table'::text)
Rows Removed by Filter: 37044
Heap Blocks: exact=4024
Buffers: shared hit=4065
-> Bitmap Index Scan on tablename_r (cost=0.00..456.29 rows=38915 width=0) (actual time=2.336..2.336 rows=44980 loops=1)
Index Cond: (relation = 'description'::text)
Buffers: shared hit=41
-> Materialize (cost=710.98..5564.15 rows=2 width=64) (actual time=0.008..0.009 rows=1 loops=1022)
Buffers: shared hit=2203
-> Append (cost=710.98..5564.14 rows=2 width=64) (actual time=7.519..7.548 rows=1 loops=1)
Buffers: shared hit=2203
-> Subquery Scan on "*SELECT* 1" (cost=710.98..3537.89 rows=1 width=64) (actual time=6.629..6.636 rows=0 loops=1)
Buffers: shared hit=1380
-> Bitmap Heap Scan on tablename (cost=710.98..3537.88 rows=1 width=96) (actual time=6.628..6.633 rows=0 loops=1)
Recheck Cond: ((relation = ANY ('{start_time,end_time,dataset_id_column,dataset_id_prefix,original_mission_name,defaultSearchRadius,author,tableType,bibcode,priority,regime,author,includesTypes,Mission,subject}'::text[])) AND (type = 'table'::text))
Filter: ((CASE relation WHEN 'Mission'::text THEN upper(value) ELSE value END = 'foo'::text) AND (CASE relation WHEN 'defaultSearchRadius'::text THEN 'default_search_radius'::text WHEN 'Mission'::text THEN 'o_name'::text WHEN 'priority'::text THEN 'table_priority'::text WHEN 'bibcode'::text THEN 'catalog_bibcode'::text WHEN 'regime'::text THEN 'frequency_regime'::text WHEN 'author'::text THEN 'table_author'::text WHEN 'tableType'::text THEN 'table_type'::text WHEN 'subject'::text THEN 'row_type'::text ELSE relation END = 'o_name'::text))
Rows Removed by Filter: 8253
Heap Blocks: exact=1276
Buffers: shared hit=1380
-> BitmapAnd (cost=710.94..710.94 rows=1275 width=0) (actual time=3.346..3.350 rows=0 loops=1)
Buffers: shared hit=104
-> Bitmap Index Scan on tablename_r (cost=0.00..134.96 rows=9145 width=0) (actual time=0.573..0.574 rows=9998 loops=1)
Index Cond: (relation = ANY ('{start_time,end_time,dataset_id_column,dataset_id_prefix,original_mission_name,defaultSearchRadius,author,tableType,bibcode,priority,regime,author,includesTypes,Mission,subject}'::text[]))
Buffers: shared hit=49
-> Bitmap Index Scan on tablename_t (cost=0.00..575.73 rows=49507 width=0) (actual time=2.693..2.693 rows=59373 loops=1)
Index Cond: (type = 'table'::text)
Buffers: shared hit=55
-> Subquery Scan on "*SELECT* 5" (cost=10.28..2026.24 rows=1 width=64) (actual time=0.886..0.904 rows=1 loops=1)
Buffers: shared hit=823
-> Bitmap Heap Scan on tablename tablename_1 (cost=10.28..2026.23 rows=1 width=96) (actual time=0.884..0.899 rows=1 loops=1)
Recheck Cond: (relation = 'containedBy'::text)
Filter: ((substr(value, 1, 8) = 'mission:'::text) AND (upper("substring"(value, 9)) = 'foo'::text))
Rows Removed by Filter: 721
Heap Blocks: exact=820
Buffers: shared hit=823
-> Bitmap Index Scan on tablename_r (cost=0.00..10.28 rows=781 width=0) (actual time=0.085..0.085 rows=905 loops=1)
Index Cond: (relation = 'containedBy'::text)
Index Cond: (relation = 'containedBy'::text)
Buffers: shared hit=3
-> Hash (cost=44.87..44.87 rows=1030 width=38) (actual time=5.334..5.342 rows=1025 loops=1)
Buckets: 2048 Batches: 1 Memory Usage: 124kB
Buffers: shared hit=1121
-> Bitmap Heap Scan on tablename b_1 (cost=33.06..44.87 rows=1030 width=38) (actual time=1.157..4.018 rows=1025 loops=1)
Recheck Cond: ((relation = 'located'::text) AND (type = 'document'::text))
Heap Blocks: exact=1113
Buffers: shared hit=1121
-> BitmapAnd (cost=33.06..33.06 rows=3 width=0) (actual time=0.765..0.769 rows=0 loops=1)
Buffers: shared hit=8
-> Bitmap Index Scan on tablename_r (cost=0.00..16.15 rows=1030 width=0) (actual time=0.347..0.347 rows=1227 loops=1)
Index Cond: (relation = 'located'::text)
Buffers: shared hit=4
-> Bitmap Index Scan on tablename_t (cost=0.00..16.15 rows=1030 width=0) (actual time=0.314..0.315 rows=1227 loops=1)
Index Cond: (type = 'document'::text)
Buffers: shared hit=4
-> Index Scan using tablename_n on tablename c (cost=0.42..18.88 rows=1 width=22) (actual time=0.048..0.115 rows=1 loops=1)
Index Cond: (name = a.name)
Filter: (relation = 'lastUpdated'::text)
Rows Removed by Filter: 58
Buffers: shared hit=29
-> Index Scan using tablename_n on tablename d (cost=0.42..17.33 rows=1 width=22) (actual time=0.034..0.104 rows=1 loops=1)
Index Cond: (name = c.name)
Filter: (relation = 'rowcount'::text)
Rows Removed by Filter: 58
Buffers: shared hit=29
Planning:
Buffers: shared hit=64
Planning Time: 5.086 ms
Execution Time: 32.226 ms
(81 rows)
This was a nice learning experience and I hope it will help with performance going forward, but I still think I'm going to keep random_page_cost = 2.0.
None of this really explains why this became a problem after ~10 years of it not being one, but I think the only likely reason is that the table just grew gradually over time and reached some threshold that changed the optimizer's plan very adversely.
Thanks,
Ed
From | Date | Subject | |
---|---|---|---|
Next Message | Andrei Lepikhov | 2024-11-09 01:21:07 | Re: Major performance degradation with joins in 15.8 or 15.7? |
Previous Message | Achilleas Mantzios | 2024-11-08 19:32:26 | tds_fdw : Severe performance degradation from postgresql 10.23 to 16.4 |