Re: Query plan prefers hash join when nested loop is much faster

From: iulian dragos <iulian(dot)dragos(at)databricks(dot)com>
To: David Rowley <dgrowleyml(at)gmail(dot)com>
Cc: pgsql-general <pgsql-general(at)postgresql(dot)org>
Subject: Re: Query plan prefers hash join when nested loop is much faster
Date: 2020-08-25 10:10:05
Message-ID: CAMNsu3mu2FGVL=qRC50fHfa+SX3PN7tu4sPVRnFmVbWpvBfYHg@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-general

On Tue, Aug 25, 2020 at 12:27 AM David Rowley <dgrowleyml(at)gmail(dot)com> wrote:

> On Sat, 22 Aug 2020 at 00:35, iulian dragos
> <iulian(dot)dragos(at)databricks(dot)com> wrote:
> > I am trying to understand why the query planner insists on using a hash
> join, and how to make it choose the better option, which in this case would
> be a nested loop.
>
> > | -> Index Scan using
> test_result_module_result_id_idx on test_result (cost=0.57..6911.17
> rows=4331 width=12) (actual time=0.002..0.002 rows=1 loops=14824) |
> > | Index Cond: (module_result_id =
> module_result.id)
> |
>
> You might want to check if the pg_stats view reports a realistic
> n_distinct value for test_result.module_result_id. If the
> pg_class.retuples is correct for that relation then that would
> indicate the n_distinct estimate is about 115000. Going by the number
> of rows you've mentioned it would appear a more realistic value for
> that would be -0.4. which is 0 - 1 / (500000000 / 200000000.0).
> However, that's assuming each module_result has a test_result. You
> could run a SELECT COUNT(DISTINCT module_result_id) FROM test_result;
> to get a better idea.
>
> If ANALYZE is not getting you a good value for n_distinct, then you
> can overwrite it. See [1], search for n_distinct.
>

Thanks for the tip! Indeed, `n_distinct` isn't right. I found it in
pg_stats set at 131736.0, but the actual number is much higher: 210104361.
I tried to set it manually, but the plan is still the same (both the actual
number and a percentage, -0.4, as you suggested):

> ALTER TABLE test_result ALTER COLUMN module_result_id SET
(n_distinct=210104361)

You're about to run a destructive command.
Do you want to proceed? (y/n): y
Your call!
ALTER TABLE
Time: 0.205s

>
> David
>
> [1] https://www.postgresql.org/docs/current/sql-altertable.html
>

In response to

Responses

Browse pgsql-general by date

  From Date Subject
Next Message Peter Eisentraut 2020-08-25 10:28:58 Re: pgbouncer bug?
Previous Message Dirk Lattermann 2020-08-25 09:12:35 Trigger transaction isolation