From: | Florian Pflug <fgp(at)phlo(dot)org> |
---|---|
To: | Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> |
Cc: | Greg Stark <stark(at)mit(dot)edu>, pgsql-hackers(at)postgresql(dot)org, Hitoshi Harada <umi(dot)tanuki(at)gmail(dot)com> |
Subject: | Re: Underspecified window queries in regression tests |
Date: | 2011-10-17 09:48:38 |
Message-ID: | D91C5B29-8B22-4EC2-803D-B7C4677E8EA5@phlo.org |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
On Oct17, 2011, at 01:09 , Tom Lane wrote:
> Florian Pflug <fgp(at)phlo(dot)org> writes:
>> ... reading those parts again, I realize the it says "When ORDER BY is omitted
>> the *default* frame consists ... ", and that the second quote is followed
>> by a footnote which says
>
>> There are options to define the window frame in other ways, but this tutorial
>> does not cover them. See Section 4.2.8 for details. [3.5, Window Functions]
>
>> So it was just me being thick. Sorry for the noise.
>
> Hmm. Maybe the use of a <footnote> there is too subtle, and we should
> instead have that text in-line (probably in parentheses)? Or we could
> use a <note>, but that's probably too much emphasis.
Inline and in parentheses sounds fine.
In addition, I think we should reword the explanation in 4.2.8 (The SQL Language
/ SQL Syntax / Value Expressions / Window Functions). Instead of that rather
long (and IMHO hard to read) paragraph about possible frame clauses and their
behaviour in the presence or absence of an ORDER BY clause, we should go with
a more algorithmic explanation I think.
Something along these lines maybe:
----------
.) PARTITION BY splits the rows into disjoint partitions. All further processing
happens only inside a single partition
.) In RANGE mode, ORDER BY then splits each partition into an ordered list of
sub-partitions, each containing rows which the ORDER BY considers to be
equivalent.
.) In ROWS mode, OTOH, each sub-partition contains only a single row. Thus, if
there are rows which are considered to be equivalent by the ORDER BY, the
ordering of the sub-partition isn't fully determined.
.) Each row's frame then consists of some consecutive range of sub-partitions.
.) In RANGE mode, that consecutive range can only start at either the first
sub-partition or the current row's sub-partition, and can only end at either
the current row's sub-partition or the last sub-partitions.
.) In ROWS mode, the consecutive range may additional start <n> sub-partitions
(or rows, it's the same thing here) before the current row, and may additionally
end <m> sub-partitions/rows after the current row.
From that, it follows that even with an underspecified sort order, the contents of
each frame are still fully determined in RANGE mode. The ordering of rows within
a frame is not determined, though. So overall, in RANGE mode, a query's result is
only non-deterministic if the window function is sensitive to the ordering of rows
within a frame.
In ROWS mode, OTOH, the contents each frame themselves are not fully determined,
so even an ordering agnostic window function may produce non-deterministic results.
----------
If you think that something along these lines would be an improvement, I can try
to come up with a patch.
best regards,
Florian Pflug
From | Date | Subject | |
---|---|---|---|
Next Message | Fujii Masao | 2011-10-17 10:53:53 | termination of backend waiting for sync rep generates a junk log message |
Previous Message | Jan Urbański | 2011-10-17 08:51:15 | Re: proposal: set GUC variables for single query |