Re: Keep notnullattrs in RelOptInfo (Was part of UniqueKey patch series)

From: Andy Fan <zhihui(dot)fan1213(at)gmail(dot)com>
To: David Rowley <dgrowleyml(at)gmail(dot)com>
Cc: PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>, Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com>, Dmitry Dolgov <9erthalion6(at)gmail(dot)com>, Ashutosh Bapat <ashutosh(dot)bapat(dot)oss(at)gmail(dot)com>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Heikki Linnakangas <hlinnaka(at)iki(dot)fi>, "Hou, Zhijie" <houzj(dot)fnst(at)cn(dot)fujitsu(dot)com>, Floris Van Nee <florisvannee(at)optiver(dot)com>
Subject: Re: Keep notnullattrs in RelOptInfo (Was part of UniqueKey patch series)
Date: 2021-02-18 12:58:13
Message-ID: CAKU4AWqhLP5h++PDNZ70QgCK5Bk2jVRjG3T87A=j_-3-qNqoMA@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Tue, Feb 16, 2021 at 12:01 PM David Rowley <dgrowleyml(at)gmail(dot)com> wrote:

> On Fri, 12 Feb 2021 at 15:18, Andy Fan <zhihui(dot)fan1213(at)gmail(dot)com> wrote:
> >
> > On Fri, Feb 12, 2021 at 9:02 AM David Rowley <dgrowleyml(at)gmail(dot)com>
> wrote:
> >> The reason I don't really like this is that it really depends where
> >> you want to use RelOptInfo.notnullattrs. If someone wants to use it
> >> to optimise something before the base quals are evaluated then they
> >> might be unhappy that they found some NULLs.
> >>
> >
> > Do you mean the notnullattrs is not set correctly before the base quals
> are
> > evaluated? I think we have lots of data structures which are set just
> after some
> > stage. but notnullattrs is special because it is set at more than 1
> stage. However
> > I'm doubtful it is unacceptable, Some fields ever change their meaning
> at different
> > stages like Var->varno. If a user has a misunderstanding on it, it
> probably will find it
> > at the testing stage.
>
> You're maybe focusing too much on your use case for notnullattrs. It
> only cares about NULLs in the result for each query level.
>
> .... thinks of an example...
>
> OK, let's say I decided that COUNT(*) is faster than COUNT(id) so
> decided that I might like to write a patch which rewrite the query to
> use COUNT(*) when it was certain that "id" could not contain NULLs.
>
> The query is:
>
> SELECT p.partid, p.partdesc,COUNT(s.saleid) FROM part p LEFT OUTER
> JOIN sales s ON p.partid = s.partid GROUP BY p.partid;
>
> sale.saleid is marked as NOT NULL in pg_attribute. As the writer of
> the patch, I checked the comment for notnullattrs and it says "Not
> null attrs, start from -FirstLowInvalidHeapAttributeNumber", so I
> should be ok to assume since sales.saleid is marked in notnullattrs
> that I can rewrite the query?!
>
> The documentation about the RelOptInfo.notnullattrs needs to be clear
> what exactly it means. I'm not saying your representation of how to
> record NOT NULL in incorrect. I'm saying that you need to be clear
> what exactly is being recorded in that field.
>
> If you want it to mean "attribute marked here cannot output NULL
> values at this query level", then you should say something along those
> lines.
>
> However, having said that, because this is a Bitmapset of
> pg_attribute.attnums, it's only possible to record Vars from base
> relations. It does not seem like you have any means to record
> attributes that are normally NULLable, but cannot produce NULL values
> due to a strict join qual.
>
> e.g: SELECT t.nullable FROM t INNER JOIN j ON t.nullable = j.something;
>
> I'd expect the RelOptInfo for t not to contain a bit for the
> "nullable" column, but there's no way to record the fact that the join
> RelOptInfo for {t,j} cannot produce a NULL for that column. It might
> be quite useful to know that for the UniqueKeys patch.
>

I checked again and found I do miss the check on JoinExpr->quals. I have
fixed it in v3 patch. Thanks for the review!

In the attached v3, commit 1 is the real patch, and commit 2 is just add
some logs to help local testing. notnull.sql/notnull.out is the test case
for
this patch, both commit 2 and notnull.* are not intended to be committed
at last.

Besides the above fix in v3, I changed the comments alongs the notnullattrs
as below and added a true positive helper function is_var_nullable.

+ /*
+ * Not null attrs, the values are calculated by looking into
pg_attribute and quals
+ * However both cases are not reliable in some outer join cases. So
when
+ * we want to check if a Var is nullable, function is_var_nullable
is a good
+ * place to start with, which is true positive.
+ */
+ Bitmapset *notnullattrs;

I also found separating the two meanings is unnecessary since both of them
are not reliable in the outer join case and we are just wanting which attr
is
nullable, no match how we know it. The below example shows why
not-null-by-qual
is not readable as well (at least with current implementation)

create table n1(a int, b int not null);
create table n2(a int, b int not null);
create table n3(a int, b int not null);

select * from n1 left join n2 on n1.a = n2.a full join n3 on n2.a = n3.a;

In this case, when we check (n1 left join n2 on n1.a = n2.a) , we know
n1.a is not nullable. However after full join with n3, it changed.

> I know there's another discussion here between Ashutosh and Tom about
> PathTarget's and Vars. I had the Var idea too once myself [1] but it
> was quickly shot down. Tom's reasoning there in [1] seems legit. I
> guess we'd need some sort of planner version of Var and never confuse
> it with the Parse version of Var. That sounds like quite a big
> project which would have quite a lot of code churn. I'm not sure how
> acceptable it would be to have Var represent both these things. It
> gets complex when you do equal(var1, var2) and expect that to return
> true when everything matches apart from the notnull field. We
> currently have this issue with the "location" field and we even have a
> special macro which just ignores those in equalfuncs.c. I imagine not
> many people would like to expand that to other fields.
>
> It would be good to agree on the correct representation for Vars that
> cannot produce NULLs so that we don't shut the door on classes of
> optimisation that require something other than what you need for your
> case.
>
> David
>
> [1]
> https://www.postgresql.org/message-id/flat/14678.1401639369%40sss.pgh.pa.us#d726d397f86755b64bb09d0c487f975f
>

--
Best Regards
Andy Fan (https://www.aliyun.com/)

Attachment Content-Type Size
v3-0001-Introduce-notnullattrs-field-in-RelOptInfo-to-ind.patch application/octet-stream 7.4 KB
v3-0002-Some-hack-for-local-testing-only.patch application/octet-stream 1.9 KB
notnull.sql application/octet-stream 1.5 KB
notnull.out application/octet-stream 9.1 KB

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Ashutosh Bapat 2021-02-18 13:21:37 Re: Printing LSN made easy
Previous Message Daniel Gustafsson 2021-02-18 12:51:18 Disallow SSL compression?