Re: Add RESPECT/IGNORE NULLS and FROM FIRST/LAST options

From: Tatsuo Ishii <ishii(at)postgresql(dot)org>
To: ojford(at)gmail(dot)com
Cc: vik(at)postgresfriends(dot)org, pgsql-hackers(at)postgresql(dot)org, tgl(at)sss(dot)pgh(dot)pa(dot)us, andrew(at)tao11(dot)riddles(dot)org(dot)uk, david(at)fetter(dot)org, krasiyan(at)gmail(dot)com
Subject: Re: Add RESPECT/IGNORE NULLS and FROM FIRST/LAST options
Date: 2024-09-12 01:41:07
Message-ID: 20240912.104107.1194619634354111525.ishii@postgresql.org
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

>> >> On Sat, 6 May 2023, 04:57 Tatsuo Ishii, <ishii(at)sraoss(dot)co(dot)jp> wrote:
>> >>>
>> >>> Attached is the patch to implement this (on top of your patch).
>> >>>
>> >>> test=# SELECT row_number() RESPECT NULLS OVER () FROM (SELECT 1) AS s;
>> >>> ERROR: window function row_number cannot have RESPECT NULLS or IGNORE NULLS
>> >>
>> >>
>> >> The last time this was discussed (https://www.postgresql.org/message-id/1037735.1610402426%40sss.pgh.pa.us) it was suggested to make the feature generalizable, beyond what the standard says it should be limited to.
>> >>
>> >> With it generalizable, there would need to be extra checks for custom functions, such as if they allow multiple column arguments (which I'll add in v2 of the patch if the design's accepted).
>> >>
>> >> So I think we need a consensus on whether to stick to limiting it to several specific functions, or making it generalized yet agreeing the rules to limit it (such as no agg functions, and no functions with multiple column arguments).

It seems you allow to use IGNORE NULLS for all window functions. If
the case, you should explicitely stat that in the docs. Otherwise
users will be confused because;

1) The SQL standard says IGNORE NULLS only for lead, lag, first_value,
last_value and nth_value.

2) Some window function returns same rows with IGNORE NULLS/RESPECT
NULLS. Consider following case.

test=# create table t1(i int);
CREATE TABLE
test=# insert into t1 values(NULL),(NULL);
INSERT 0 2
test=# select * from t1;
i
---


(2 rows)

test=# SELECT row_number() IGNORE NULLS OVER w FROM t1 WINDOW w AS (ORDER BY i);
row_number
------------
1
2
(2 rows)

The t1 table only contains NULL rows. By using IGNORE NULLS, I think
it's no wonder that a user expects 0 rows returned, if there's no
mention in the docs that actually IGNORE NULLS/RESPECT NULLS are just
ignored in some window functions.

Instead I think it's better that other than lead, lag, first_value,
last_value and nth_value each window function errors out if IGNORE
NULLS/RESPECT NULL are passed to these window functions.

I take a look at the patch and noticed that following functions have
no comments on what they are doing and what are the arguments. Please
look into other functions in nodeWindowAgg.c and add appropriate
comments to those functions.

+static void increment_notnulls(WindowObject winobj, int64 pos)

+static Datum ignorenulls_getfuncarginpartition(WindowObject winobj, int argno,
+ int relpos, int seektype, bool set_mark, bool *isnull, bool *isout) {

+static Datum ignorenulls_getfuncarginframe(WindowObject winobj, int argno,
+ int relpos, int seektype, bool set_mark, bool *isnull, bool *isout) {

Also the coding style does not fit into our coding standard. They should be written something like:

static void
increment_notnulls(WindowObject winobj, int64 pos)

static Datum
ignorenulls_getfuncarginpartition(WindowObject winobj, int argno,
int relpos, int seektype, bool set_mark, bool *isnull, bool *isout)
{

static Datum
ignorenulls_getfuncarginframe(WindowObject winobj, int argno,
int relpos, int seektype, bool set_mark, bool *isnull, bool *isout)
{

See also:
https://www.postgresql.org/docs/current/source-format.html

+ int ignore_nulls; /* ignore nulls */

You should add more comment here. I.e. what values are possible for
ignore_nulls.

I also notice that you have an array in memory which records non-null
row positions in a partition. The position is represented in int64,
which means 1 entry consumes 8 bytes. If my understanding is correct,
the array continues to grow up to the partition size. Also the array
is created for each window function (is it really necessary?). I worry
about this because it might consume excessive memory for big
partitions.

Best reagards,
--
Tatsuo Ishii
SRA OSS K.K.
English: http://www.sraoss.co.jp/index_en/
Japanese:http://www.sraoss.co.jp

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message David Rowley 2024-09-12 01:43:42 Re: Remove useless GROUP BY columns considering unique index
Previous Message Thomas Munro 2024-09-12 01:05:59 Re: CI, macports, darwin version problems