From: | Konstantin Knizhnik <k(dot)knizhnik(at)postgrespro(dot)ru> |
---|---|
To: | David Rowley <dgrowleyml(at)gmail(dot)com>, Andy Fan <zhihui(dot)fan1213(at)gmail(dot)com> |
Cc: | Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Alvaro Herrera <alvherre(at)alvh(dot)no-ip(dot)org>, Andres Freund <andres(at)anarazel(dot)de>, PostgreSQL Developers <pgsql-hackers(at)lists(dot)postgresql(dot)org> |
Subject: | Re: Hybrid Hash/Nested Loop joins and caching results from subplans |
Date: | 2020-12-10 16:44:03 |
Message-ID: | 4bbece40-d648-b6f9-a323-fcb8f4b45e78@postgrespro.ru |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
On 09.12.2020 23:53, David Rowley wrote:
> On Tue, 8 Dec 2020 at 20:15, David Rowley <dgrowleyml(at)gmail(dot)com> wrote:
>> I've attached another patchset that addresses some comments left by
>> Zhihong Yu over on [1]. The version number got bumped to v12 instead
>> of v11 as I still have a copy of the other version of the patch which
>> I made some changes to and internally named v11.
> If anyone else wants to have a look at these, please do so soon. I'm
> planning on starting to take a serious look at getting 0001-0003 in
> early next week.
>
> David
>
I tested the patched version of Postgres on JOBS benchmark:
https://github.com/gregrahn/join-order-benchmark
For most queries performance is the same, some queries are executed
faster but
one query is 150 times slower:
explain analyze SELECT MIN(chn.name) AS character,
MIN(t.title) AS movie_with_american_producer
FROM char_name AS chn,
cast_info AS ci,
company_name AS cn,
company_type AS ct,
movie_companies AS mc,
role_type AS rt,
title AS t
WHERE ci.note LIKE '%(producer)%'
AND cn.country_code = '[us]'
AND t.production_year > 1990
AND t.id = mc.movie_id
AND t.id = ci.movie_id
AND ci.movie_id = mc.movie_id
AND chn.id = ci.person_role_id
AND rt.id = ci.role_id
AND cn.id = mc.company_id
AND ct.id = mc.company_type_id;
explain analyze SELECT MIN(cn.name) AS from_company,
MIN(lt.link) AS movie_link_type,
MIN(t.title) AS non_polish_sequel_movie
FROM company_name AS cn,
company_type AS ct,
keyword AS k,
link_type AS lt,
movie_companies AS mc,
movie_keyword AS mk,
movie_link AS ml,
title AS t
WHERE cn.country_code !='[pl]'
AND (cn.name LIKE '%Film%'
OR cn.name LIKE '%Warner%')
AND ct.kind ='production companies'
AND k.keyword ='sequel'
AND lt.link LIKE '%follow%'
AND mc.note IS NULL
AND t.production_year BETWEEN 1950 AND 2000
AND lt.id = ml.link_type_id
AND ml.movie_id = t.id
AND t.id = mk.movie_id
AND mk.keyword_id = k.id
AND t.id = mc.movie_id
AND mc.company_type_id = ct.id
AND mc.company_id = cn.id
AND ml.movie_id = mk.movie_id
AND ml.movie_id = mc.movie_id
AND mk.movie_id = mc.movie_id;
QUERY PLAN
-------------------------------------------------------------------------------------------------------------------------------------------------------------
------------------------------------
Finalize Aggregate (cost=300131.43..300131.44 rows=1 width=64)
(actual time=522985.919..522993.614 rows=1 loops=1)
-> Gather (cost=300131.00..300131.41 rows=4 width=64) (actual
time=522985.908..522993.606 rows=5 loops=1)
Workers Planned: 4
Workers Launched: 4
-> Partial Aggregate (cost=299131.00..299131.01 rows=1
width=64) (actual time=522726.599..522726.606 rows=1 loops=5)
-> Hash Join (cost=38559.78..298508.36 rows=124527
width=33) (actual time=301521.477..522726.592 rows=2 loops=5)
Hash Cond: (ci.role_id = rt.id)
-> Hash Join (cost=38558.51..298064.76
rows=124527 width=37) (actual time=301521.418..522726.529 rows=2 loops=5)
Hash Cond: (mc.company_type_id = ct.id)
-> Nested Loop (cost=38557.42..297390.45
rows=124527 width=41) (actual time=301521.392..522726.498 rows=2 loops=5)
-> Nested Loop
(cost=38556.98..287632.46 rows=255650 width=29) (actual
time=235.183..4596.950 rows=156421 loops=5)
Join Filter: (t.id = ci.movie_id)
-> Parallel Hash Join
(cost=38556.53..84611.99 rows=162109 width=29) (actual
time=234.991..718.934 rows=119250 loops
=5)
Hash Cond: (t.id =
mc.movie_id)
-> Parallel Seq Scan on
title t (cost=0.00..43899.19 rows=435558 width=21) (actual
time=0.010..178.332 rows=34
9806 loops=5)
Filter:
(production_year > 1990)
Rows Removed by
Filter: 155856
-> Parallel Hash
(cost=34762.05..34762.05 rows=303558 width=8) (actual
time=234.282..234.285 rows=230760 loops
=5)
Buckets: 2097152
(originally 1048576) Batches: 1 (originally 1) Memory Usage: 69792kB
-> Parallel Hash
Join (cost=5346.12..34762.05 rows=303558 width=8) (actual
time=11.846..160.085 rows=230
760 loops=5)
Hash Cond:
(mc.company_id = cn.id)
-> Parallel
Seq Scan on movie_companies mc (cost=0.00..27206.55 rows=841655
width=12) (actual time
=0.013..40.426 rows=521826 loops=5)
-> Parallel
Hash (cost=4722.92..4722.92 rows=49856 width=4) (actual
time=11.658..11.659 rows=16969
loops=5)
Buckets: 131072 Batches: 1 Memory Usage: 4448kB
-> Parallel Seq Scan on company_name cn (cost=0.00..4722.92 rows=49856
width=4) (actual time
=0.014..8.324 rows=16969 loops=5)
Filter: ((country_code)::text = '[us]'::text)
Rows Removed by Filter: 30031
-> Result Cache
(cost=0.45..1.65 rows=2 width=12) (actual time=0.019..0.030 rows=1
loops=596250)
Cache Key: mc.movie_id
Hits: 55970 Misses:
62602 Evictions: 0 Overflows: 0 Memory Usage: 6824kB
Worker 0: Hits: 56042
Misses: 63657 Evictions: 0 Overflows: 0 Memory Usage: 6924kB
Worker 1: Hits: 56067
Misses: 63659 Evictions: 0 Overflows: 0 Memory Usage: 6906kB
Worker 2: Hits: 55947
Misses: 62171 Evictions: 0 Overflows: 0 Memory Usage: 6767kB
Worker 3: Hits: 56150
Misses: 63985 Evictions: 0 Overflows: 0 Memory Usage: 6945kB
-> Index Scan using
cast_info_movie_id_idx on cast_info ci (cost=0.44..1.64 rows=2
width=12) (actual time=0.03
3..0.053 rows=1 loops=316074)
Index Cond:
(movie_id = mc.movie_id)
Filter:
((note)::text ~~ '%(producer)%'::text)
Rows Removed by
Filter: 25
-> Result Cache (cost=0.44..0.59
rows=1 width=20) (actual time=3.311..3.311 rows=0 loops=782104)
Cache Key: ci.person_role_id
Hits: 5 Misses: 156294
Evictions: 0 Overflows: 0 Memory Usage: 9769kB
Worker 0: Hits: 0 Misses:
156768 Evictions: 0 Overflows: 0 Memory Usage: 9799kB
Worker 1: Hits: 1 Misses:
156444 Evictions: 0 Overflows: 0 Memory Usage: 9778kB
Worker 2: Hits: 0 Misses:
156222 Evictions: 0 Overflows: 0 Memory Usage: 9764kB
Worker 3: Hits: 0 Misses:
156370 Evictions: 0 Overflows: 0 Memory Usage: 9774kB
-> Index Scan using
char_name_pkey on char_name chn (cost=0.43..0.58 rows=1 width=20)
(actual time=0.001..0.001 rows
=0 loops=782098)
Index Cond: (id =
ci.person_role_id)
-> Hash (cost=1.04..1.04 rows=4 width=4)
(actual time=0.014..0.014 rows=4 loops=5)
Buckets: 1024 Batches: 1 Memory
Usage: 9kB
-> Seq Scan on company_type ct
(cost=0.00..1.04 rows=4 width=4) (actual time=0.012..0.012 rows=4 loops=5)
-> Hash (cost=1.12..1.12 rows=12 width=4)
(actual time=0.027..0.028 rows=12 loops=5)
Buckets: 1024 Batches: 1 Memory Usage: 9kB
-> Seq Scan on role_type rt
(cost=0.00..1.12 rows=12 width=4) (actual time=0.022..0.023 rows=12 loops=5)
Planning Time: 2.398 ms
Execution Time: 523002.608 ms
(55 rows)
I attach file with times of query execution.
--
Konstantin Knizhnik
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company
Attachment | Content-Type | Size |
---|---|---|
join.sql | application/sql | 110.2 KB |
results.csv | text/csv | 2.7 KB |
From | Date | Subject | |
---|---|---|---|
Next Message | Stephen Frost | 2020-12-10 17:16:02 | Re: Change default of checkpoint_completion_target |
Previous Message | David Fetter | 2020-12-10 16:15:12 | Re: [HACKERS] [PATCH] Generic type subscripting |