Re: Row pattern recognition

From: Vik Fearing <vik(at)postgresfriends(dot)org>
To: Tatsuo Ishii <ishii(at)sraoss(dot)co(dot)jp>
Cc: jchampion(at)timescale(dot)com, pgsql-hackers(at)postgresql(dot)org
Subject: Re: Row pattern recognition
Date: 2023-07-28 08:56:26
Message-ID: 60651930-70bb-c849-1862-e8f7eb109094@postgresfriends.org
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On 7/28/23 09:09, Tatsuo Ishii wrote:
>>> We already recalculate a frame each time a row is processed even
>>> without RPR. See ExecWindowAgg.
>>
>> Yes, after each row. Not for each function.
>
> Ok, I understand now. Closer look at the code, I realized that each
> window function calls update_frameheadpos, which computes the frame
> head position. But actually it checks winstate->framehead_valid and if
> it's already true (probably by other window function), then it does
> nothing.
>
>>> Also RPR always requires a frame option ROWS BETWEEN CURRENT ROW,
>>> which means the frame head is changed each time current row position
>>> changes.
>>
>> Off topic for now: I wonder why this restriction is in place and
>> whether we should respect or ignore it. That is a discussion for
>> another time, though.
>
> My guess is, it is because other than ROWS BETWEEN CURRENT ROW has
> little or no meaning. Consider following example:

Yes, that makes sense.

>>>> I strongly disagree with this. Window function do not need to know
>>>> how the frame is defined, and indeed they should not.
>>> We already break the rule by defining *support functions. See
>>> windowfuncs.c.
>> The support functions don't know anything about the frame, they just
>> know when a window function is monotonically increasing and execution
>> can either stop or be "passed through".
>
> I see following code in window_row_number_support:
>
> /*
> * The frame options can always become "ROWS BETWEEN UNBOUNDED
> * PRECEDING AND CURRENT ROW". row_number() always just increments by
> * 1 with each row in the partition. Using ROWS instead of RANGE
> * saves effort checking peer rows during execution.
> */
> req->frameOptions = (FRAMEOPTION_NONDEFAULT |
> FRAMEOPTION_ROWS |
> FRAMEOPTION_START_UNBOUNDED_PRECEDING |
> FRAMEOPTION_END_CURRENT_ROW);
>
> I think it not only knows about frame but it even changes the frame
> options. This seems far from "don't know anything about the frame", no?

That's the planner support function. The row_number() function itself
is not even allowed to *have* a frame, per spec. We allow it, but as
you can see from that support function, we completely replace it.

So all of the partition-level window functions are not affected by RPR
anyway.

>> I have two comments about this:
>>
>> It isn't just for convenience, it is for correctness. The window
>> functions do not need to know which rows they are *not* operating on.
>>
>> There is no such thing as a "full" or "reduced" frame. The standard
>> uses those terms to explain the difference between before and after
>> RPR is applied, but window functions do not get to choose which frame
>> they apply over. They only ever apply over the reduced window frame.
>
> I agree that "full window frame" and "reduced window frame" do not
> exist at the same time, and in the end (after computation of reduced
> frame), only "reduced" frame is visible to window
> functions/aggregates. But I still do think that "full window frame"
> and "reduced window frame" are important concept to explain/understand
> how PRP works.

If we are just using those terms for documentation, then okay.
--
Vik Fearing

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Michael Paquier 2023-07-28 09:19:22 Re: Support worker_spi to execute the function dynamically.
Previous Message Etsuro Fujita 2023-07-28 08:55:52 Re: postgres_fdw: wrong results with self join + enable_nestloop off