Re: Nested Loop vs Hash Join based on predicate?

From: Pavel Stehule <pavel(dot)stehule(at)gmail(dot)com>
To: "Doiron, Daniel" <DoironD(at)advisory(dot)com>
Cc: "pgsql-performance(at)postgresql(dot)org" <pgsql-performance(at)postgresql(dot)org>
Subject: Re: Nested Loop vs Hash Join based on predicate?
Date: 2016-03-16 21:01:24
Message-ID: CAFj8pRA1oW1Ozx235x-tuCd9M-n63LvdqQu4-6uoK2MwX3H28w@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-performance

2016-03-16 21:23 GMT+01:00 Doiron, Daniel <DoironD(at)advisory(dot)com>:

> I have the following queries:
>
> EXPLAIN (ANALYZE, VERBOSE, COSTS, BUFFERS, TIMING)
> select […]
> from f_calc_service a11,
> d_patient_type a12
> where a11.d_patient_pop_id in (336)
> and a11.d_patient_type_id = a12.id
> and a12.short_name = 'I'
> group by a11.d_rate_schedule_id,
> a11.d_payer_id,
> a11.d_patient_pop_id,
> a11.d_patient_type_id
> ;
>
> And
>
> EXPLAIN (ANALYZE, VERBOSE, COSTS, BUFFERS, TIMING)
> select […]
> from f_calc_service a11,
> d_patient_type a12
> where a11.d_patient_pop_id in (336)
> and a11.d_patient_type_id = a12.id
> and a12.short_name = 'O'
> group by a11.d_rate_schedule_id,
> a11.d_payer_id,
> a11.d_patient_pop_id,
> a11.d_patient_type_id
> ;
>
> Making this one change from short_name = ‘I’ to short_name = ‘O’ changes
> the query execution from 200k ms to 280ms. The first one chooses a Nested
> Loop, the second chooses a hash join. How do I get them both to choose the
> same? There are no values for d_patient_pop_id in (336) and short_name =
> ‘I’.
>

we don't see plans, so it is blind shot,

Probably the estimation for 'I' value is pretty underestimated - so planner
choose nested loop. The reasons can be different - possible correlation
inside data for example.

You can try:

0) ensure so your statistic are current - run statement ANALYZE

a) increase statistic by statement ALTER TABLE xx ALTER COLUMN yyy SET
STATISTICS some number

b) penalize nested loop - statement SET enable_nestloop TO off;

Regards

Pavel

>
> Thanks!
>
> Dan
>
>
>

In response to

Browse pgsql-performance by date

  From Date Subject
Next Message Artem Tomyuk 2016-03-17 09:57:13 using shared_buffers during seq_scan
Previous Message Doiron, Daniel 2016-03-16 20:23:05 Nested Loop vs Hash Join based on predicate?