Re: Urgent - SQL left join bug?

From: Kaijiang Chen <chenkaijiang(at)gmail(dot)com>
To: Pantelis Theodosiou <ypercube(at)gmail(dot)com>
Cc: Pg Bugs <pgsql-bugs(at)postgresql(dot)org>
Subject: Re: Urgent - SQL left join bug?
Date: 2017-06-21 08:48:35
Message-ID: CAAkGvS-YTj0M=DT=s2OnaLiG2WYHmOe7eZFyjFMAEHQF+2m4yw@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-bugs

The pg planner / optimizer is awesome! It optimizes the 1st SQL's left join
into inner join (and their are equivalent in that SQL) and made me come to
the silly conclusion that it is a "bug"...

On Wed, Jun 21, 2017 at 4:30 PM, Kaijiang Chen <chenkaijiang(at)gmail(dot)com>
wrote:

> Thank you PG guys! I fix the SQL and everything works so well!
>
> On Wed, Jun 21, 2017 at 4:28 PM, Kaijiang Chen <chenkaijiang(at)gmail(dot)com>
> wrote:
>
>> The correct SQL should be:
>>
>> select pha.id,ph.herb_id,pha.name,ph.weight
>> from prescription_herbs as ph left join pharmacy_herbs as pha on
>> ph.herb_id=pha.herb_id *and pha.pharmacy_id=22*
>> where ph.prescription_id=116285 and ph.deleted_at is null and
>> pha.deleted_at is null order by ph.herb_id;
>>
>>
>>
>> On Wed, Jun 21, 2017 at 4:26 PM, Kaijiang Chen <chenkaijiang(at)gmail(dot)com>
>> wrote:
>>
>>> Hi, Pantelis and Heikki, thank you very much for such a quick response!
>>>
>>> I got it. I was so silly.....
>>>
>>> On Wed, Jun 21, 2017 at 4:23 PM, Pantelis Theodosiou <ypercube(at)gmail(dot)com
>>> > wrote:
>>>
>>>>
>>>>
>>>> On Wed, Jun 21, 2017 at 9:06 AM, Kaijiang Chen <chenkaijiang(at)gmail(dot)com>
>>>> wrote:
>>>>
>>>>> Hi, I'm running PostgreSQL 9.4.10 on CentOS 6.5. It looks like that I
>>>>> found a bug with left join. It is very URGENT since it is running in the
>>>>> production servers.
>>>>>
>>>>> ### Conditions: ###
>>>>>
>>>>> I have 2 tables:
>>>>>
>>>>> TABLE 1: (2171209 records)
>>>>> \d prescription_herbs
>>>>> Table
>>>>> "public.prescription_herbs"
>>>>> Column | Type |
>>>>> Modifiers
>>>>> ---------------------+--------------------------------+-----
>>>>> ------------------------------------------------------------
>>>>> id | integer | not null
>>>>> default nextval('prescription_herbs_id_seq'::regclass)
>>>>> prescription_id | integer | not null
>>>>> herb_id | integer | not null
>>>>> weight | integer | not null
>>>>> created_at | timestamp(0) without time zone | not null
>>>>> updated_at | timestamp(0) without time zone | not null
>>>>> deleted_at | timestamp(0) without time zone |
>>>>> price | numeric(10,5) |
>>>>> special_manufacture | character varying(255) |
>>>>> cost | numeric(10,5) |
>>>>> pharmacy_id | integer |
>>>>> Indexes:
>>>>> "prescription_herbs_pkey" PRIMARY KEY, btree (id)
>>>>> "prescription_herbs_hid" btree (herb_id)
>>>>> "prescription_herbs_prid" btree (prescription_id)
>>>>>
>>>>> TABLE 2: (4406 records)
>>>>>
>>>>> \d pharmacy_herbs
>>>>> Table "public.pharmacy_herbs"
>>>>> Column | Type |
>>>>> Modifiers
>>>>> -------------+--------------------------------+-------------
>>>>> ------------------------------------------------
>>>>> id | integer | not null default
>>>>> nextval('pharmacy_herbs_id_seq'::regclass)
>>>>> pharmacy_id | integer |
>>>>> herb_id | integer |
>>>>> cost | numeric(10,5) |
>>>>> price | numeric(10,5) |
>>>>> no | character varying(255) |
>>>>> deleted_at | timestamp(0) without time zone |
>>>>> created_at | timestamp(0) without time zone | not null
>>>>> updated_at | timestamp(0) without time zone | not null
>>>>> name | character varying(255) |
>>>>> Indexes:
>>>>> "pharmacy_herbs_pkey" PRIMARY KEY, btree (id)
>>>>> "pharmacy_herbs_herb_id" btree (herb_id)
>>>>>
>>>>> ### BUG: ###
>>>>>
>>>>> I ran a SQL:
>>>>> select pha.id,ph.herb_id,pha.name,ph.weight
>>>>> from prescription_herbs as ph *left join* pharmacy_herbs as pha on
>>>>> ph.herb_id=pha.herb_id
>>>>> where ph.prescription_id=116285 and ph.deleted_at is null and
>>>>> pha.deleted_at is null and pha.pharmacy_id=22 order by ph.herb_id;
>>>>>
>>>>> Expected:
>>>>>
>>>>> It should have 10 rows because the SQL "select * from
>>>>> prescription_herbs as ph where ph.prescription_id=116285 and ph.deleted_at
>>>>> is null" returned 10 rows and I'm using *LEFT JOIN *in the above SQL.
>>>>>
>>>>> Actual Result:
>>>>>
>>>>> It returned only 9 rows and the result is the same as JOIN (not LEFT
>>>>> JOIN).
>>>>>
>>>>> ### More info: ###
>>>>>
>>>>> I explain the SQL:
>>>>> explain select pha.id,ph.herb_id,pha.name,ph.weight
>>>>> from prescription_herbs as ph left join pharmacy_herbs as pha on
>>>>> ph.herb_id=pha.herb_id
>>>>> where ph.prescription_id=116285 and ph.deleted_at is null and
>>>>> pha.deleted_at is null and pha.pharmacy_id=22 order by ph.herb_id;
>>>>>
>>>>> I got:
>>>>>
>>>>> Sort (cost=131.73..131.76 rows=10 width=24)
>>>>> Sort Key: ph.herb_id
>>>>> -> * Hash Join* (cost=122.02..131.56 rows=10 width=24)
>>>>> Hash Cond: (ph.herb_id = pha.herb_id)
>>>>> -> Index Scan using prescription_herbs_prid on
>>>>> prescription_herbs ph (cost=0.43..9.68 rows=23 width=8)
>>>>> Index Cond: (prescription_id = 116285)
>>>>> Filter: (deleted_at IS NULL)
>>>>> -> Hash (cost=113.08..113.08 rows=681 width=20)
>>>>> -> Seq Scan on pharmacy_herbs pha (cost=0.00..113.08
>>>>> rows=681 width=20)
>>>>> Filter: ((deleted_at IS NULL) AND (pharmacy_id =
>>>>> 22))
>>>>>
>>>>> I think the above* "Hash Join" SHOULD BE "Hash Left Join"*, right?
>>>>>
>>>>> I tried to explain another SQL:
>>>>> explain select * from doctors d left join prescriptions p on d.id
>>>>> =p.doctor_id;
>>>>>
>>>>> I got:
>>>>> Hash Right Join (cost=2159.33..31453.58 rows=130330 width=2936)
>>>>> Hash Cond: (p.doctor_id = d.id)
>>>>> -> Seq Scan on prescriptions p (cost=0.00..9273.30 rows=130330
>>>>> width=495)
>>>>> -> Hash (cost=576.37..576.37 rows=5037 width=2441)
>>>>> -> Seq Scan on doctors d (cost=0.00..576.37 rows=5037
>>>>> width=2441)
>>>>>
>>>>> The "Hash Right Join" is the correct node.
>>>>>
>>>>> Any help is very appreciated! WAITING...
>>>>>
>>>>> Thanks,
>>>>> Kaijiang
>>>>>
>>>>>
>>>> This is not a bug. The conditions of the "right" table pha - basically
>>>> (pha.pharmacy_id=22) and secondary ((pha.deleted_at is null) )- that you
>>>> have put in the WHERE clause make the query act as if it was an INNER join.
>>>>
>>>> Move them to the ON and you'll get your 10 rows.
>>>>
>>>> select ...
>>>> from prescription_herbs as ph *left join* pharmacy_herbs as pha
>>>> on ph.herb_id=pha.herb_id and pha.deleted_at is null and
>>>> pha.pharmacy_id=22
>>>> where ph.prescription_id=116285 and ph.deleted_at is null
>>>> order by ...
>>>>
>>>> The (pha.deleted_at is null) part may give different results depending
>>>> on where it is placed (ON vs WHERE) but my guess - since you want all the
>>>> 10 rows of the left table - is that it should be in ON, too
>>>>
>>>> Pantelis
>>>>
>>>
>>>
>>
>

In response to

Browse pgsql-bugs by date

  From Date Subject
Next Message Mark Manley 2017-06-21 12:48:15 Re: BUG #14709: inconsistent answers with foreign data wrappers to mysql
Previous Message Kaijiang Chen 2017-06-21 08:30:01 Re: Urgent - SQL left join bug?