Quick Links

Re: Hybrid Hash/Nested Loop joins and caching results from subplans

From:	Konstantin Knizhnik <k(dot)knizhnik(at)postgrespro(dot)ru>
To:	David Rowley <dgrowleyml(at)gmail(dot)com>, Andy Fan <zhihui(dot)fan1213(at)gmail(dot)com>
Cc:	Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Alvaro Herrera <alvherre(at)alvh(dot)no-ip(dot)org>, Andres Freund <andres(at)anarazel(dot)de>, PostgreSQL Developers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject:	Re: Hybrid Hash/Nested Loop joins and caching results from subplans
Date:	2020-12-10 16:44:03
Message-ID:	4bbece40-d648-b6f9-a323-fcb8f4b45e78@postgrespro.ru
Views:	Whole Thread \| Raw Message \| Download mbox \| Resend email
Thread:
Lists:	pgsql-hackers

On 09.12.2020 23:53, David Rowley wrote:
> On Tue, 8 Dec 2020 at 20:15, David Rowley <dgrowleyml(at)gmail(dot)com> wrote:
>> I've attached another patchset that addresses some comments left by
>> Zhihong Yu over on [1]. The version number got bumped to v12 instead
>> of v11 as I still have a copy of the other version of the patch which
>> I made some changes to and internally named v11.
> If anyone else wants to have a look at these, please do so soon. I'm
> planning on starting to take a serious look at getting 0001-0003 in
> early next week.
>
> David
>
I tested the patched version of Postgres on JOBS benchmark:

https://github.com/gregrahn/join-order-benchmark

For most queries performance is the same, some queries are executed
faster but
one query is 150 times slower:

explain analyze SELECT MIN(chn.name) AS character,
       MIN(t.title) AS movie_with_american_producer
FROM char_name AS chn,
     cast_info AS ci,
     company_name AS cn,
     company_type AS ct,
     movie_companies AS mc,
     role_type AS rt,
     title AS t
WHERE ci.note LIKE '%(producer)%'
AND cn.country_code = '[us]'
AND t.production_year > 1990
AND t.id = mc.movie_id
AND t.id = ci.movie_id
AND ci.movie_id = mc.movie_id
AND chn.id = ci.person_role_id
AND rt.id = ci.role_id
AND cn.id = mc.company_id
AND ct.id = mc.company_type_id;
explain analyze SELECT MIN(cn.name) AS from_company,
       MIN(lt.link) AS movie_link_type,
       MIN(t.title) AS non_polish_sequel_movie
FROM company_name AS cn,
     company_type AS ct,
     keyword AS k,
     link_type AS lt,
     movie_companies AS mc,
     movie_keyword AS mk,
     movie_link AS ml,
     title AS t
WHERE cn.country_code !='[pl]'
AND (cn.name LIKE '%Film%'
       OR cn.name LIKE '%Warner%')
AND ct.kind ='production companies'
AND k.keyword ='sequel'
AND lt.link LIKE '%follow%'
AND mc.note IS NULL
AND t.production_year BETWEEN 1950 AND 2000
AND lt.id = ml.link_type_id
AND ml.movie_id = t.id
AND t.id = mk.movie_id
AND mk.keyword_id = k.id
AND t.id = mc.movie_id
AND mc.company_type_id = ct.id
AND mc.company_id = cn.id
AND ml.movie_id = mk.movie_id
AND ml.movie_id = mc.movie_id
AND mk.movie_id = mc.movie_id;

QUERY PLAN

-------------------------------------------------------------------------------------------------------------------------------------------------------------
------------------------------------
Finalize Aggregate (cost=300131.43..300131.44 rows=1 width=64)
(actual time=522985.919..522993.614 rows=1 loops=1)
   -> Gather (cost=300131.00..300131.41 rows=4 width=64) (actual
time=522985.908..522993.606 rows=5 loops=1)
         Workers Planned: 4
         Workers Launched: 4
         -> Partial Aggregate (cost=299131.00..299131.01 rows=1
width=64) (actual time=522726.599..522726.606 rows=1 loops=5)
               -> Hash Join (cost=38559.78..298508.36 rows=124527
width=33) (actual time=301521.477..522726.592 rows=2 loops=5)
                     Hash Cond: (ci.role_id = rt.id)
                     -> Hash Join (cost=38558.51..298064.76
rows=124527 width=37) (actual time=301521.418..522726.529 rows=2 loops=5)
                           Hash Cond: (mc.company_type_id = ct.id)
                           -> Nested Loop (cost=38557.42..297390.45
rows=124527 width=41) (actual time=301521.392..522726.498 rows=2 loops=5)
                                 -> Nested Loop
(cost=38556.98..287632.46 rows=255650 width=29) (actual
time=235.183..4596.950 rows=156421 loops=5)
                                       Join Filter: (t.id = ci.movie_id)
                                       -> Parallel Hash Join
(cost=38556.53..84611.99 rows=162109 width=29) (actual
time=234.991..718.934 rows=119250 loops
=5)
                                             Hash Cond: (t.id =
mc.movie_id)
                                             -> Parallel Seq Scan on
title t (cost=0.00..43899.19 rows=435558 width=21) (actual
time=0.010..178.332 rows=34
9806 loops=5)
                                                   Filter:
(production_year > 1990)
                                                   Rows Removed by
Filter: 155856
                                             -> Parallel Hash
(cost=34762.05..34762.05 rows=303558 width=8) (actual
time=234.282..234.285 rows=230760 loops
=5)
                                                   Buckets: 2097152
(originally 1048576) Batches: 1 (originally 1) Memory Usage: 69792kB
                                                   -> Parallel Hash
Join (cost=5346.12..34762.05 rows=303558 width=8) (actual
time=11.846..160.085 rows=230
760 loops=5)
                                                         Hash Cond:
(mc.company_id = cn.id)
                                                         -> Parallel
Seq Scan on movie_companies mc (cost=0.00..27206.55 rows=841655
width=12) (actual time
=0.013..40.426 rows=521826 loops=5)
                                                         -> Parallel
Hash (cost=4722.92..4722.92 rows=49856 width=4) (actual
time=11.658..11.659 rows=16969
loops=5)
Buckets: 131072 Batches: 1 Memory Usage: 4448kB
-> Parallel Seq Scan on company_name cn (cost=0.00..4722.92 rows=49856
width=4) (actual time
=0.014..8.324 rows=16969 loops=5)
Filter: ((country_code)::text = '[us]'::text)
Rows Removed by Filter: 30031
                                       -> Result Cache
(cost=0.45..1.65 rows=2 width=12) (actual time=0.019..0.030 rows=1
loops=596250)
                                             Cache Key: mc.movie_id
                                             Hits: 55970 Misses:
62602 Evictions: 0 Overflows: 0 Memory Usage: 6824kB
                                             Worker 0: Hits: 56042
Misses: 63657 Evictions: 0 Overflows: 0 Memory Usage: 6924kB
                                             Worker 1: Hits: 56067
Misses: 63659 Evictions: 0 Overflows: 0 Memory Usage: 6906kB
                                             Worker 2: Hits: 55947
Misses: 62171 Evictions: 0 Overflows: 0 Memory Usage: 6767kB
                                             Worker 3: Hits: 56150
Misses: 63985 Evictions: 0 Overflows: 0 Memory Usage: 6945kB
                                             -> Index Scan using
cast_info_movie_id_idx on cast_info ci (cost=0.44..1.64 rows=2
width=12) (actual time=0.03
3..0.053 rows=1 loops=316074)
                                                   Index Cond:
(movie_id = mc.movie_id)
                                                   Filter:
((note)::text ~~ '%(producer)%'::text)
                                                   Rows Removed by
Filter: 25
                                 -> Result Cache (cost=0.44..0.59
rows=1 width=20) (actual time=3.311..3.311 rows=0 loops=782104)
                                       Cache Key: ci.person_role_id
                                       Hits: 5 Misses: 156294
Evictions: 0 Overflows: 0 Memory Usage: 9769kB
                                       Worker 0: Hits: 0 Misses:
156768 Evictions: 0 Overflows: 0 Memory Usage: 9799kB
                                       Worker 1: Hits: 1 Misses:
156444 Evictions: 0 Overflows: 0 Memory Usage: 9778kB
                                       Worker 2: Hits: 0 Misses:
156222 Evictions: 0 Overflows: 0 Memory Usage: 9764kB
                                       Worker 3: Hits: 0 Misses:
156370 Evictions: 0 Overflows: 0 Memory Usage: 9774kB
                                       -> Index Scan using
char_name_pkey on char_name chn (cost=0.43..0.58 rows=1 width=20)
(actual time=0.001..0.001 rows
=0 loops=782098)
                                             Index Cond: (id =
ci.person_role_id)
                           -> Hash (cost=1.04..1.04 rows=4 width=4)
(actual time=0.014..0.014 rows=4 loops=5)
                                 Buckets: 1024 Batches: 1 Memory
Usage: 9kB
                                 -> Seq Scan on company_type ct
(cost=0.00..1.04 rows=4 width=4) (actual time=0.012..0.012 rows=4 loops=5)
                     -> Hash (cost=1.12..1.12 rows=12 width=4)
(actual time=0.027..0.028 rows=12 loops=5)
                           Buckets: 1024 Batches: 1 Memory Usage: 9kB
                           -> Seq Scan on role_type rt
(cost=0.00..1.12 rows=12 width=4) (actual time=0.022..0.023 rows=12 loops=5)
Planning Time: 2.398 ms
Execution Time: 523002.608 ms
(55 rows)

I attach file with times of query execution.

--
Konstantin Knizhnik
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company

Attachment	Content-Type	Size
join.sql	application/sql	110.2 KB
results.csv	text/csv	2.7 KB

In response to

Re: Hybrid Hash/Nested Loop joins and caching results from subplans at 2020-12-09 20:53:40 from David Rowley

Responses

Re: Hybrid Hash/Nested Loop joins and caching results from subplans at 2020-12-10 22:03:05 from David Rowley
Re: Hybrid Hash/Nested Loop joins and caching results from subplans at 2021-02-03 06:51:53 from David Rowley

Browse pgsql-hackers by date

	From	Date	Subject
Next Message	Stephen Frost	2020-12-10 17:16:02	Re: Change default of checkpoint_completion_target
Previous Message	David Fetter	2020-12-10 16:15:12	Re: [HACKERS] [PATCH] Generic type subscripting