Re: Bug in row_number() optimization

From: David Rowley <dgrowleyml(at)gmail(dot)com>
To: Richard Guo <guofenglinux(at)gmail(dot)com>
Cc: Sergey Shinderuk <s(dot)shinderuk(at)postgrespro(dot)ru>, PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>, David Rowley <drowley(at)postgresql(dot)org>
Subject: Re: Bug in row_number() optimization
Date: 2022-11-25 03:26:00
Message-ID: CAApHDvrv_1KBjGWsvoFO1dHke57c8bkQUVwSHevkivrf2KjaYg@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Fri, 25 Nov 2022 at 16:00, Richard Guo <guofenglinux(at)gmail(dot)com> wrote:
> Verified the problem is fixed with this patch. I'm not familiar with
> the WindowAgg execution codes. As far as I understand, this patch works
> because we set ecxt_aggnulls to true, making it a NULL value. And the
> top-level WindowAgg node's "Filter" is strict so that it can filter out
> all the tuples that don't match the intermediate WindowAgg node's run
> condition. So I find the comments about "WindowAggs above us cannot
> reference the result of another WindowAgg" confusing. But maybe I'm
> missing something.

There are two different pass-through modes that the WindowAgg can move
into when it detects that the run condition is no longer true:

1) WINDOWAGG_PASSTHROUGH and
2) WINDOWAGG_PASSTHROUGH_STRICT

#2 is used when the WindowAgg is the top-level one in this query
level. Remember we'll need multiple WindowAgg nodes when there are
multiple different windows to evaluate. The reason that we need #1 is
that if there are multiple WindowAggs, then the top-level one (or just
any WindowAgg above it) might need all the rows from a lower-level
WindowAgg. For example:

SELECT * FROM (SELECT row_number() over(order by id) rn, sum(qty) over
(order by date) qty from t) t where rn <= 10;

if the "order by id" window is evaluated first, we can't stop
outputting rows when rn <= 10 is no longer true as the "order by date"
window might need those. In this case, once rn <= 10 is no longer
true, the WindowAgg for that window would go into
WINDOWAGG_PASSTHROUGH. This means we can stop window func evaluation
on any additional rows. The final query will never see rn==11, so we
don't need to generate that.

The problem is that once the "order by id" window stops evaluating the
window funcs, if the window result is byref, then we leave junk in the
aggregate output columns. Since we continue to output rows from that
WindowAgg for the top-level "order by date" window, we don't want to
form tuples with free'd memory.

Since nothing in the query will ever seen rn==11 and beyond, there's
no need to put anything in that part of the output tuple. We can just
make it an SQL NULL.

What I mean by "WindowAggs above us cannot reference the result of
another WindowAgg" is that the evaluation of sum(qty) over (order by
date) can't see the "rn" column. SQL does not allow it. If it did,
that would have to look something like:

SELECT * FROM (SELECT SUM(row_number() over (order by id)) over (order
by date) qty from t); -- not valid SQL

WINDOWAGG_PASSTHROUGH_STRICT not only does not evaluate window funcs,
it also does not even bother to store tuples in the tuple store. In
this case there's no higher-level WindowAgg that will need these
tuples, so we can just read our sub-node until we find the next
partition, or stop when there's no PARTITION BY clause.

Just thinking of the patch a bit more, what I wrote ends up
continually zeroing the values and marking the columns as NULL. Likely
we can just do this once when we do: winstate->status =
WINDOWAGG_PASSTHROUGH; I'll test that out and make sure it works.

David

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Dilip Kumar 2022-11-25 04:01:17 Re: Add sub-transaction overflow status in pg_stat_activity
Previous Message Dilip Kumar 2022-11-25 03:21:55 Re: Patch: Global Unique Index