Re: BUG #15324: Non-deterministic behaviour from parallelised sub-query

From: Andrew Gierth <andrew(at)tao11(dot)riddles(dot)org(dot)uk>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Marko Tiikkaja <marko(at)joh(dot)to>, Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>, Andres Freund <andres(at)anarazel(dot)de>, Andrew Fletcher <andy(at)prestigedigital(dot)com>, PostgreSQL mailing lists <pgsql-bugs(at)lists(dot)postgresql(dot)org>
Subject: Re: BUG #15324: Non-deterministic behaviour from parallelised sub-query
Date: 2018-08-15 00:44:27
Message-ID: 87sh3g4h8n.fsf@news-spur.riddles.org.uk
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-bugs

>>>>> "Tom" == Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> writes:

>> Marking the function parallel safe doesn't seem wrong to me. The
>> non-parallel-safe part is that the input gets fed to it in different
>> order in different workers. And I don't really think that to be the
>> function's fault.

Tom> So that basically opens the question of whether *any* window
Tom> function calculation can safely be pushed down to parallel
Tom> workers.

Grepping the spec for the phrase "possibly non-deterministic" is quite
enlightening. Leaving out non-determinisms caused by timezone or actual
volatility, leaving out cases of non-determinism that we'd call
"stable", and leaving out features like multisets that we don't support
at all, here's the list of interesting cases (comments after each quoted
paragraph are mine):

6.28 <value expression>

d) An <array value constructor by query>.

i.e. ARRAY(select)

o) An <aggregate function> that specifies MIN or MAX and that simply
contains a <value expression> whose declared type is based on a
character string type, user-defined type, or datetime with time
zone type.

i.e. MIN(x) is non-deterministic if "x" can have distinguishable values
that compare equal. PG doesn't have that for text or timestamptz, unlike
the spec, but it does for citext or other user-defined types.

q) An <array aggregate function>.

i.e. array_agg()

u) A <window function> that specifies ROW_NUMBER, FIRST_VALUE,
LAST_VALUE, NTH_VALUE, NTILE, LEAD, or LAG, or whose associated
<window specification> specifies ROWS.

This covers those cases where window functions don't treat peer rows
together.

7.6 <table reference>

27) A <table reference> is possibly non-deterministic if the simply
contained <table primary> or <joined table> is possibly
non-deterministic or if <sample clause> is specified.

i.e. TABLESAMPLE is non-deterministic

7.16 <query specification>

a) The <set quantifier> DISTINCT is specified and one of the columns
of T has a data type of character string, user-defined type, TIME
WITH TIME ZONE, or TIMESTAMP WITH TIME ZONE.

c) The <select list>, <having clause>, or <window clause> contains a
reference to a column C of T that has a data type of character
string, user-defined type, TIME WITH TIME ZONE, or TIMESTAMP WITH
TIME ZONE, and the functional dependency G C, where G is the set
consisting of the grouping columns of T, holds in T.

For both the above two cases, if distinguishable values of a type
compare equal, it's non-deterministic which gets into the result.

7.17 <query expression>

a) The <query expression> contains a <result offset clause>.

b) The <query expression> contains a <fetch first clause>.

f) Both of the following are true:

i) T contains a set operator UNION and ALL is not specified, or T
contains either of the set operators EXCEPT or INTERSECT.

ii) At least one of the following is true:

1) The first or second operand contains a column that has a
declared type of character string.

2) The first or second operand contains a column that has a
declared type of datetime with time zone.

3) The first or second operand contains a column that has a
declared type that is a user-defined type.

(I've left out the many clauses which just amount to "if $thing contains
something which is possibly non-deterministic then it is possibly
non-deterministic")

--
Andrew (irc:RhodiumToad)

In response to

Browse pgsql-bugs by date

  From Date Subject
Next Message Amit Kapila 2018-08-15 06:51:54 Re: BUG #15324: Non-deterministic behaviour from parallelised sub-query
Previous Message PG Bug reporting form 2018-08-14 22:16:30 BUG #15326: keep getting "ERROR: type "earth" does not exist" while it does exist