Re: BUG #16968: Planner does not recognize optimization

From: Eugen Konkov <kes-kes(at)yandex(dot)ru>
To: David Rowley <dgrowleyml(at)gmail(dot)com>
Cc: PG Bug reporting form <noreply(at)postgresql(dot)org>, PostgreSQL mailing lists <pgsql-bugs(at)lists(dot)postgresql(dot)org>, pgsql-performance <pgsql-performance(at)postgresql(dot)org>, ladayaroslav(at)yandex(dot)ru
Subject: Re: BUG #16968: Planner does not recognize optimization
Date: 2021-05-15 14:34:16
Message-ID: 1035195294.20210515173416@yandex.ru
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-bugs pgsql-performance

Hello David,

I found a case when `not assigning a ressortgroupref to the whole-row var` cause
wrong window function calculations.

I use same query. The difference come when I wrap my query into
function. (see full queries in attachment)

1.
SELECT *
FROM agreement_totals( tstzrange( '2020-07-01', '2020-08-01' ) )
WHERE agreement_id = 161::int AND (o).period_id = 10::int

2.
SELECT *
sum( .... ) over wagreement
FROM ....
WHERE agreement_id = 161::int AND (o).period_id = 10::int
WINDOW wagreement AS ( PARTITION BY agreement_id )

For first query window function calculates SUM over all agreements,
then some are filtered out by (o).period_id condition.

But for second query agreements with "wrong" (o).period_id are filtered out,
then SUM is calculated.

I suppose here is problem with `not assigning a ressortgroupref to the whole-row var`
which cause different calculation when I try to filter: (o).period_id

I will also attach plans for both queries.

Friday, May 14, 2021, 2:52:33 AM, you wrote:

> On Fri, 14 May 2021 at 02:38, Eugen Konkov <kes-kes(at)yandex(dot)ru> wrote:
>> Now I create minimal reproducible test case.
>> https://dbfiddle.uk/?rdbms=postgres_13&fiddle=761a00fb599789d3db31b120851d6341

>> Optimization is not applyed when I filter/partition by column using composite type name.

> You probably already know this part, but let me explain it just in
> case it's not clear.

> The pushdown of the qual from the top-level query into the subquery,
> or function, in this case, is only legal when the qual references a
> column that's in the PARTITION BY clause of all window functions in
> the subquery. The reason for this is, if we filter rows before
> calling the window function, then it could affect which rows are in
> see in the window's frame. If it did filter, that could cause
> incorrect results. We can relax the restriction a bit if we can
> eliminate entire partitions at once. The window function results are
> independent between partitions, so we can allow qual pushdowns that
> are in all PARTITION BY clauses.

> As for the reason you're having trouble getting this to work, it's
> down to the way you're using whole-row vars in your targetlist.

> A slightly simplified case which shows this problem is:

> create table ab(a int, b int);
> explain select * from (select ab as wholerowvar,row_number() over
> (partition by a) from ab) ab where (ab.wholerowvar).a=1;

> The reason it does not work is down to how this is implemented
> internally. The details are, transformGroupClause() not assigning a
> ressortgroupref to the whole-row var. It's unable to because there is
> no way to track which actual column within the whole row var is in the
> partition by clause. When it comes to the code that tries to push the
> qual down into the subquery, check_output_expressions() checks if the
> column in the subquery is ok to accept push downs or not. One of the
> checks is to see if the query has windowing functions and to ensure
> that the column is in all the PARTITION BY clauses of each windowing
> function. That check is done by checking if a ressortgroupref is
> assigned and matches a tleSortGroupRef in the PARTITION BY clause. In
> this case, it does not match. We didn't assign any ressortgroupref to
> the whole-row var.

> Unfortunately, whole-row vars are a bit to 2nd class citizen when it
> comes to the query planner. Also, it would be quite a bit of effort to
> make the planner push down the qual in this case. We'd need some sort
> of ability to assign ressortgroupref to a particular column within a
> whole-row var and we'd need to adjust the code to check for that when
> doing subquery pushdowns to allow it to mention which columns within
> whole-row vars can legally accept pushdowns. I imagine that's
> unlikely to be fixed any time soon. Whole-row vars just don't seem to
> be used commonly enough to warrant going to the effort of making this
> stuff work.

> To work around this, you should include a reference to the actual
> column in the targetlist of the subquery, or your function, in this
> case, and ensure you use that same column in the PARTITION BY clause.
> You'll then need to write that column in your condition that you need
> pushed into the subquery. I'm sorry if that messes up your design.
> However, I imagine this is not the only optimisation that you'll miss
> out on by doing things the way you are.

> David

--
Best regards,
Eugen Konkov

Attachment Content-Type Size
full.txt text/plain 3.8 KB
direct-plan.txt text/plain 124.4 KB
function-plan.txt text/plain 122.5 KB

In response to

Responses

Browse pgsql-bugs by date

  From Date Subject
Next Message David Rowley 2021-05-15 14:52:47 Re: BUG #16968: Planner does not recognize optimization
Previous Message varun kamal 2021-05-15 07:39:11 Re: BUG #17007: server process (PID XXXX) was terminated by signal 11: Segmentation fault

Browse pgsql-performance by date

  From Date Subject
Next Message David Rowley 2021-05-15 14:52:47 Re: BUG #16968: Planner does not recognize optimization
Previous Message Marcin Gozdalik 2021-05-14 15:47:20 Re: Very slow "bloat query"