Re: BUG #15324: Non-deterministic behaviour from parallelised sub-query

From: Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Marko Tiikkaja <marko(at)joh(dot)to>, Andres Freund <andres(at)anarazel(dot)de>, andy(at)prestigedigital(dot)com, PostgreSQL mailing lists <pgsql-bugs(at)lists(dot)postgresql(dot)org>
Subject: Re: BUG #15324: Non-deterministic behaviour from parallelised sub-query
Date: 2018-08-14 04:10:38
Message-ID: CAA4eK1+1yszJgG3WBnMrRAEsBymH871c3O8UQtq3iif=siOc0g@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-bugs

On Mon, Aug 13, 2018 at 10:56 PM, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
> Marko Tiikkaja <marko(at)joh(dot)to> writes:
>> On Mon, Aug 13, 2018 at 7:35 PM, Andres Freund <andres(at)anarazel(dot)de> wrote:
>>> Well, the subselect with thelimit going to return different results from
>>> run to run. Unless you add an ORDER BY there's no guaranteed order in
>>> which tuples are returned. So I don't think it's surprising that you're
>>> getting results that differ between runs.
>
>> While this is true, that's missing the point.
>
> Yeah, I agree. I think probably what's happening is that the sub-select
> is getting pushed down to the parallel workers and they are not all
> computing the same set of sub-select results, leading to inconsistent
> answers at the top level.
>

Your analysis is correct. The plan for one of the reported query is as follows:

postgres=# explain select * from repro1 where account in (select
account from repro1 where page
postgres(# = 'success.html' limit 3);
QUERY PLAN
-----------------------------------------------------------------------------------------------
Gather (cost=1000.71..12727.24 rows=3 width=11)
Workers Planned: 2
-> Hash Semi Join (cost=0.71..11726.94 rows=1 width=11)
Hash Cond: (repro1.account = repro1_1.account)
-> Parallel Seq Scan on repro1 (cost=0.00..10532.50
rows=454750 width=11)
-> Hash (cost=0.67..0.67 rows=3 width=4)
-> Limit (cost=0.00..0.64 rows=3 width=4)
-> Seq Scan on repro1 repro1_1
(cost=0.00..19627.50 rows=91823 width=4)
Filter: ((page)::text = 'success.html'::text)
(9 rows)

As Tom said, it is evident from the plan that the Limit clause is
pushed in the inner-side of the parallel plan and not all the workers
compute the same result set for the inner side.

> Likely, we need to treat the presence of a LIMIT/OFFSET in a sub-select
> as making it parallel-unsafe, for exactly the reason that that makes
> its results non-deterministic.
>

Yeah, one idea could be that we detect this in
max_parallel_hazard_walker during the very first pass it performs on
query-tree. Basically, in the SubLink node check, we can detect
whether the subselect has Limit/Offset clause and if so, then we can
treat it as parallel_unsafe. I have tried that way and it prohibits
the parallel plan for the reported queries. However, I think more
analysis and verification is required to see if it can happen in any
other related cases. BTW, will there be any problem if we allow
sub-selects which have sortclause even if the Limit/Offset is present?

Let me know if you have already started working on it, otherwise, I
will prepare an initial patch.

--
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com

In response to

Responses

Browse pgsql-bugs by date

  From Date Subject
Next Message Hemanth Kumar 2018-08-14 06:38:06 Issue
Previous Message Tom Lane 2018-08-13 22:38:04 Re: BUG #15324: Non-deterministic behaviour from parallelised sub-query