Re: Request for Patch Feedback: Lag & Lead Window Functions Can Ignore Nulls

From: Troels Nielsen <bn(dot)troels(at)gmail(dot)com>
To: Robert Haas <robertmhaas(at)gmail(dot)com>
Cc: Jeff Davis <pgsql(at)j-davis(dot)com>, Nicholas White <n(dot)j(dot)white(at)gmail(dot)com>, Alvaro Herrera <alvherre(at)2ndquadrant(dot)com>, Hitoshi Harada <umi(dot)tanuki(at)gmail(dot)com>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: Request for Patch Feedback: Lag & Lead Window Functions Can Ignore Nulls
Date: 2013-06-21 22:29:03
Message-ID: CAOdE5WQnfR737OkxNXuLWnwpL7=OUssC-63ijP2=2RnRTw8emQ@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Hello all

I've been examining PostgreSQL to gain a greater understanding
of RDBMS. (Thanks for a nice, very educational system!)

In the process I've been looking into a few problems and the
complications of this patch appeared relatively uninvolved, so I
tried to look for a solution.

I found the following:

The grammar conflict appears to be because of ambiguities in:
1. table_ref (used exclusively in FROM clauses)
2. index_elem (used exclusively in INDEX creation statements).

Now, this doesn't seem to make much sense, as AFAICT window functions
are explicitly disallowed in these contexts (transformWindowFuncCall
will yield errors, and I can't really wrap my head around what a
window function call would mean there).

I therefore propose a simple rearrangement of the grammar,
syntactically disallowing window functions in the outer part of those
contexts (a_expr's inside can't and shouldn't be done much about)
which will allow both RESPECT and IGNORE to become unreserved
keywords, without doing any lexer hacking or abusing the grammar.

I've attached a patch which will add RESPECT NULLS and IGNORE NULLS to
the grammar in the right place. Also the window frame options are set
but nothing more, so this patch needs to be merged with Nicholas White's
original patch.

One problem I see with this approach to the grammar is that the
error messages will change when putting window functions in any of the
forbidden places. The new error messages are I think worse and less
specific than the old ones. I suppose that can be fixed though, or
maybe the problem isn't so severe.

Example of old error message:
window functions are not allowed in functions in FROM

New error message:
syntax error at or near "OVER"

in addition I think the original patch as it stands has a few
problems that I haven't seen discussed:

1. The result of the current patch using lead

create table test_table (
id serial,
val integer);
insert into test_table (val) select * from unnest(ARRAY[1,2,3,4,NULL, NULL,
NULL, 5, 6, 7]);

select val, lead(val, 2) ignore nulls over (order by id) from test_table;
val | lead
-----+------
1 | 3
2 | 4
3 | 4
4 | 4
| 4
| 5
| 6
5 | 7
6 | 7
7 | 7
(10 rows)

I would expect it to output:

select val, lead(val, 2) ignore nulls over (order by id) from test_table;
val | lead
-----+------
1 | 3
2 | 4
3 | 5
4 | 6
| 6
| 6
| 6
5 | 7
6 |
7 |
(10 rows)

That is: skip two rows forward not counting null rows.

The lag behavior works well as far as I can see.

2. It would be nice if an error was given when ignore nulls was used
on a function for which it had no effect. Perhaps this should be up to
the different window function themselves to do though.

Apart from those points I think the original patch is nice and provides a
functionality
that's definitely nice to have.

Kind Regards
Troels Nielsen

On Fri, Jun 21, 2013 at 8:11 PM, Robert Haas <robertmhaas(at)gmail(dot)com> wrote:

> On Fri, Jun 21, 2013 at 11:33 AM, Jeff Davis <pgsql(at)j-davis(dot)com> wrote:
> >> Regardless of what syntax we settle on, we should also make sure that
> >> the conflict is intrinsic to the grammar and can't be factored out, as
> >> Tom suggested upthread. It's not obvious to me what the actual
> >> ambiguity is here. If you've seen "select lag(num,0)" and the
> >> lookahead token is "respect", what's the problem? It sort of looks
> >> like it could be a column label, but not even unreserved keywords can
> >> be column labels, so that's not it. Probably deserves a bit more
> >> investigation...
> >
> > I think the problem is when the function is used as a table function;
> > e.g.:
> >
> > SELECT * FROM generate_series(1,10) respect;
>
> Ah ha. Well, there's probably not much help for that. Disallowing
> keywords as table aliases would be a cure worse than the disease, I
> think. I suppose the good news is that there probably aren't many
> people using RESPECT as a column name, but I have a feeling that we're
> almost certain to get complaints about reserving IGNORE. I think that
> will have to be quoted as a PL/pgsql variable name as well. :-(
>
> >> We could just add additional, optional Boolean argument to the
> >> existing functions. It's non-standard, but we avoid adding keywords.
> >
> > I thought about that, but it is awkward because the argument would have
> > to be constant (or, if not, it would be quite strange).
>
> True... but e.g. string_agg() has the same issue.
>
> --
> Robert Haas
> EnterpriseDB: http://www.enterprisedb.com
> The Enterprise PostgreSQL Company
>
>
> --
> Sent via pgsql-hackers mailing list (pgsql-hackers(at)postgresql(dot)org)
> To make changes to your subscription:
> http://www.postgresql.org/mailpref/pgsql-hackers
>

Attachment Content-Type Size
respect_nulls_and_ignore_nulls_grammar.patch application/octet-stream 11.6 KB

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Josh Berkus 2013-06-21 22:34:28 Re: Fixed Cardinality estimation with equality predicates between column of the same table
Previous Message Thom Brown 2013-06-21 21:52:04 Re: Unaccent performance