Re: Hybrid Hash/Nested Loop joins and caching results from subplans

From: David Rowley <dgrowleyml(at)gmail(dot)com>
To: Andy Fan <zhihui(dot)fan1213(at)gmail(dot)com>
Cc: Simon Riggs <simon(at)2ndquadrant(dot)com>, PostgreSQL Developers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject: Re: Hybrid Hash/Nested Loop joins and caching results from subplans
Date: 2020-06-02 20:54:41
Message-ID: CAApHDvorfu97V1G=+WvwzdWwO7XjFERqHg55umqF6nvh2kZJQQ@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Tue, 2 Jun 2020 at 21:05, Andy Fan <zhihui(dot)fan1213(at)gmail(dot)com> wrote:
> Today I tested the correctness & performance of this patch based on TPC-H
> workload, the environment is setup based on [1]. Correctness is tested by
> storing the result into another table when this feature is not introduced and
> then enable this feature and comparing the result with the original ones. No
> issue is found at this stage.

Thank you for testing it out.

> I also checked the performance gain for TPC-H workload, totally 4 out of the 22
> queries uses this new path, 3 of them are subplan, 1 of them is nestloop. All of
> changes gets a better result. You can check the attachments for reference.
> normal.log is the data without this feature, patched.log is the data with the
> feature. The data doesn't show the 10x performance gain, I think that's mainly
> data size related.

Thanks for running those tests. I had a quick look at the results and
I think to say that all 4 are better is not quite right. One is
actually a tiny bit slower and one is only faster due to a plan
change. Here's my full analysis.

Q2 uses a result cache for the subplan and has about a 37.5% hit ratio
which reduces the execution time of the query down to 67% of the
original.
Q17 uses a result cache for the subplan and has about a 96.5% hit
ratio which reduces the execution time of the query down to 24% of the
original time.
Q18 uses a result cache for 2 x nested loop joins and has a 0% hit
ratio. The execution time is reduced to 91% of the original time only
because the planner uses a different plan, which just happens to be
faster by chance.
Q20 uses a result cache for the subplan and has a 0% hit ratio. The
execution time is 100.27% of the original time. There are 8620 cache
misses.
All other queries use the same plan with and without the patch.

> At the code level, I mainly checked nestloop path and cost_resultcache_rescan,
> everything looks good to me. I'd like to check the other parts in the following days.

Great.

> [1] https://ankane.org/tpc-h

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Vik Fearing 2020-06-02 21:58:51 Re: Default gucs for EXPLAIN
Previous Message Bruce Momjian 2020-06-02 20:51:16 Re: Default gucs for EXPLAIN