Re: Performance of TPC-DS Query 95

From: David Rowley <dgrowleyml(at)gmail(dot)com>
To: Ba Jinsheng <bajinsheng(at)u(dot)nus(dot)edu>
Cc: Andrei Lepikhov <lepihov(at)gmail(dot)com>, "pgsql-performance(at)lists(dot)postgresql(dot)org" <pgsql-performance(at)lists(dot)postgresql(dot)org>
Subject: Re: Performance of TPC-DS Query 95
Date: 2024-11-25 22:41:22
Message-ID: CAApHDvr=zusfekWdNGe8dh_C23mBMXjhz6ZYkuJN_vbkgND0vw@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-performance

On Tue, 26 Nov 2024 at 09:55, Ba Jinsheng <bajinsheng(at)u(dot)nus(dot)edu> wrote:
> TPC-DS query 95:
> Its execution time is nearly 1 min:

> CTE ws_wh
> -> Hash Join (cost=37772.14..198810.77 rows=7242361 width=12) (actual time=211.161..1443.926 rows=6644004 loops=1)

> If applying this patch:

> - if (is_innerrel_unique_for(root, joinrelids, outerrelids, innerrel,
> + if (!is_innerrel_unique_for(root, joinrelids, outerrelids, innerrel,

> The execution time is reduced to 6 seconds:

> CTE ws_wh
> -> Hash Join (cost=37772.14..74062.53 rows=7095248 width=12) (actual time=203.407..560.264 rows=719205 loops=1)

> The difference between both query plans is the second one uses Materialize instead of Memoize. From the code, it seems that changing the usage of the cache brings performance improvement unexpectedly.

What's going on here is that you've introduced a new bug which has the
Hash Join effectively perform a semi-join rather than an inner join so
that it only joins the first matching inner-side row instead of all
matching inner-side rows for each outer-side row. The bug is fairly
evident when to see that the Hash Join produces 6644004 rows without
your change and only 719205 rows after you added the bug.

I'm not sure what you expect us to do here, but just in case you're
not aware, we're not going to prioritise performance over correct
results.

I expect the problem you've introduced will be more evident if you use
EXPLAIN ANALYZE VERBOSE. The verbose option will show which joins are
marked "Inner Unique". If you want to learn more about the
optimisation you've broken, see [1].

David

[1] https://git.postgresql.org/gitweb/?p=postgresql.git;a=commit;h=9c7f5229ad68d7e0e4dd149e3f80257893e404d4

In response to

Browse pgsql-performance by date

  From Date Subject
Next Message David Rowley 2024-11-26 04:03:24 Re: CTE Inline On TPC-DS Query 95
Previous Message Ba Jinsheng 2024-11-25 21:02:46 CTE Inline On TPC-DS Query 95