Self contradictory examining on rel's baserestrictinfo

From: ro b <bigbro_wq(at)hotmail(dot)com>
To: pgsql-hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject: Self contradictory examining on rel's baserestrictinfo
Date: 2024-11-25 08:58:04
Message-ID: TYCPR01MB6093E2A06F3F8CF8DF17B5CE852E2@TYCPR01MB6093.jpnprd01.prod.outlook.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Hi,
I committed the major features of self-contradictory examining patches.
I'd like to explain it in the following aspects.

1. Background
A few months ago, when i read source codes of B-tree in routine
_bt_preprocess_keys, i found that there are more contradictory
checking case we can add. I sent email to pgsql-hackers and
then community contributor replied me and told me someone had
already proposed this question. Thanks for taking the time
to address my question. After serveral conversations, i found
that we can do something more. We can place these jobs at planning time.
2. Under the frame
In practice, B-tree index is used widely; Binary operation expression
that Var op Const is often written; Also without data type coercion
and collation. So features above are supported.
There are several considerations 1) self-contradictory situation
is seldom appeared. 2) It's much expensive if spend too much effort
on it and find that rel is not self-contradictory. 3) The expression
evaluation mechanism will do the right things finally. So this is
a compromise.
3. Detailes
      We will get the base restrictinfo list after deconstructed jointree.
1) The baserestrictinfo is a AND-implict list, that mean if one element
of the list is contradictory to others or self-contradictory the whole
rel is self-contradictory, then we can mark the rel is dummy.

2) The stragegy operators(>,>=,=,<,<=) of B-tree, some of them may be
contradictory to others. Another operator need to be considered is <>
the negator of =.
2.1) = against >, >=, =, <, <=, <>
2.2) >, >= against <, <=

3) We can eliminate the weaker restrictive operator if there is a more
restrictive operator (eg. x > 4 and x >=5, the sub-expression x > 4 will
be abandoned). This will simplify the subsequent examining.
A special situation we need to care about is if we encounter operator '='
and there is not contradictory, the operator '=' will dominate the others
operators.

4) We need to record down the details when we encounter the operator '<>'
and then check them oen by one if then operator '=' appear.
I use linear searching when the number of elements is less than or equal 4
or we can't find a support function to sort these elements.
If we find a support function and the number of elements is greater than 4,
sort these elements and use binary searching.
I choose number 4 here for two considerations. 1) the type size of storing
the const value multiply 4 match a cpu cache line usually. 2) In practice
we will not write down much not equal expressions generally.

5) NULL test
When IS NULL is set, it's contradictory to IS NOT NULL obviously.
And it's contradictory to operator that imply IS NOT NULL implictly.
If we find one of these operators ( >,>=,=,<,<=,<> ) and the expression with
the operator make sense (eg. NULL is not in expression) then we can recognize
this situation is contradictory to IS NULL test.

6) Boolean type
Boolean type is a little tricky, because it can evaluate with both of
test and opreator.
When we encounter the operator one of =, <> or expression is a single
Var or NOT Boolexpr with a single Var. We need to extract details to
make a test format (eg. x = true to x is true; x <> not false to x is false.)

When Var is set with IS_TRUE, this is contradictory to IS_NOT_TRUE,
IS_FALSE and IS_UNKNOWN.
When Var is set with IS_FALSE, this is contradictory to IS_TRUE,
IS_NOT_FALSE and IS_UNKNOWN.
When Var is set with IS_UNKNOWN, this is contradictory to IS_NOT_UNKNOWN,
IS_TRUE and IS_FALSE.
      
When the NULL test and self test of of boolean type are both appeared.
If IS NULL is set it's contradictory to IS TRUE, IS FALSE, IS NOT
UNKNOWN; If IS NOT NULL is set it's contradictory to IS UNKNOWN.
      
Finally we need check them carefully when strategy operator appear.
If we found boolean expression is written like x > true or x < false,
it's beyond the scope of the defined values of boolean type, the expression
does not make sense obviously.
If IS_UNKNOWN is set and it's contradictory to stragegy operator.
If IS_NOT_FALSE is set, that means the equivalent expression is (eg. x IS
NULL OR x IS TRUE). The NULL test will always fail with the strategy operator,
so we only need to test whether the sub-expression (x IS TRUE) is
contradictory with the strategy operator. If IS_NOT_TRUE is set, it's
similar to IS_NOT_FALSE.
      
7) Scalar array comparison expression
First we need to deconstruct the const array, figure out the null and non-null
elements.
If ALL flag is set and the Const contain NULL. we will get nothing (eg. x <=
ALL(array[56, null])), it's contradictory.
Then sort the non-null elements and eliminate any duplicates. We will give up
if we can't get the sorting support function.
1. Expression with operator <>
We can examine them one by one if ALL flag is set, it's similar to the common
expression (eg. x <> 2 and x <> 3);
If ANY flag is set and the number of sorted elements is only one, we can convert
the scalar array expression to a binary operator expression, but we can't do
anything else if there are more than one elements.

2. Expression with one of these operators >,>=,<,<=
We just need check the boundary element (eg. x > ALL(array[1,2,3]) it's equal
to x > 3) and convert expression to the equivalent binary operator expression.

3. Expression with operator =
We treat it as a binary operator expression if the number of sorted elements
is only one;
When the number of sorted elements is more than one.
It is contradictory if ALL flag is set.
If ANY flag is set, we need to postpone them before the collection of same type
expression is completed (eg. x in(1,3) and x in(1,5) we can extract the same
value 1 when we collected all of these expression over). Then we check
the intersection elements.
      
8) Row comparison expression
We need convert row comparison to nonconstructor.
The operator <> and = have already done in analysis phase. So we just need
to take care of one of operators <, <=, > and >=.
The basic rules for evaluating a row comparison expression is the row elements
are compared left-to-right, stopping as soon as an unequal or null pair
of elements is found.
Let us ignore the NULL that may appear in the row comparison expression
temporarily, the logic of deconstructing row comparison expression is like:
ROW(a1, a2, ..., an) >= ROW(b1, b2, ..., bn)
Normal format:
(a1 > b1)
OR (a1 = b1 AND a2 > b2)
OR (a1 = b1 AND a2 = b2 AND a3 > b3)
...
OR (a1 = b1 AND a2 = b2 AND ... AND an >= bn)
The difference between operators in row comparison expression is the more
restrictive operator is used in last sub-condition of every sub-OR expression
(eg. a3 > b3 the more restrictive > is used), but the operator of last
sub-condition (an >= bn) of the last sub-OR expression is according to the
operator of row comparison expression.
We get two regulations:
1. operator '=' is used in every sub-condition except the last one in
every sub-OR expression or the sub-OR expression had no equal sub-condition.
So we can record down the equal sub-condition with a list(i called it
sub-condition-list) and then we will iterate the list to generating
sub-conditions for subsequent sub-OR expression when we meet a new pair
of elements.
2. The more restrictive operator is used in last sub-condition of
every sub-OR expression except the last sub-OR expression. row expression's
operator weill be used int he last sub-condition of last sub-OR expression.

It's time to face the special cases now.
1. NULL value.
If either of this pair of elements is null, we can skip the evaluation of
subsequent elements, but if the NULL come out at first position then the
whole expression will supply nothing.
For example ROW(a,NULL,c) > (1,2,3)
normal format:
a > 1
or a = 1 and NULL > 2
or a = 1 and NULL = 2 and c > 3
The sub-OR expression doesn't make sense if NULL in sub-condition. So the
final converted expression is a > 1.

2. Pairs of elements are equal.
When they contain volatile functions, we will give up and just return back.
If they are not Const that mean element IS NOT NULL. We need add the NULL
test to the sub-condition-list for generating subsequent sub-OR expression.
It's need to ignore them if they are Const, so we don't add it to the
sub-condition-list.
We skip the current sub-OR expression generating, but if the pair of elements
come out at the last position we need be carefull.
If the operator of row comparison expression is a more restrictive operator,
we skip the last sub-OR expression generating, but if the operator is weaker
we need to generate the last sub-OR expression.
For examples:
ROW(a,b,c) > (1,b,3)        ROW(a,2,c) > (1,2,3)
normal format:         normal format:
a > 1         a > 1
or a = 1 and b > b         or a = 1 and 2 > 2
or a = 1 and b = b and c > 3        or a = 1 and 2 = 2 and c > 3
The final expression is The final expression is
a > 1         a > 1
or a = 1 and b IS NOT NULL and c > 3 or a = 1 and c > 3

ROW(a,b,c) > (1,2,c)    ROW(a,b,c) >= (1,2,c)
normal format:       normal format:
a > 1            a > 1
or a = 1 and b > 2 or a = 1 and b > 2
or a = 1 and b = 2 and c > c or a = 1 and b = 2 and c >= c
The final expression is The final expression is
a > 1            a > 1
or a = 1 and b > 2       or a = 1 and b > 2
            or a = 1 and b = 2 and c is not null
If case 1 and 2 are both appeared in row expression.
We will skip the subsequent pairs of elements when we meet the NULL.
So it just needs to be considered case 2 is in front of 1.
A special situation is all of pairs of elements are equal before we
meet NULL, according to the rule of case 2 we don't generate current
sub-OR expression if meet a pairs of elements these are equal.
So the nonconstructor is empty and we need a flag to know the result of
evaluation of the last pair of elements, then the result of row expression
is according to the flag.

3. Pairs of elements are Const but not equal.
Seeing the above normal format, there is no need to generate sub-OR
expression for the subsequent pairs of elements if we meet the pair of
elements they are Const but not equal.
But whether it's need to generate the current sub-OR expression or not is
according to the reulst of evaluation of current pair of elements.
If the reulst is positive we need to generate the sub-OR expression or
do nothing.
If the pair of elements come out at the first position, then the result
of row comparison expression is according to result of the pair of elements.
If the result is positive, the reuslt of row comparison expression is
positive.
Another situation is all of pairs of elements are equal before we meet
the not equal pair of elements, it's similar to what we discussed in case 2.

We will get a OR-expression finally. Then we can do the examining jobs and
prune the OR-expression, we only keep the output expression that is a simple
expression. The OR-format is not better than row expression for subsequent
processing. If we need watch the output OR-expression we should add macro
DEBUG_ROW_DECODE when configure our environment.

9) OR-expression
We need postpone OR-expression if we meet it. Because we can push other
expressions these have the same semantic level with OR-expression down into
the OR-expression. Then we can check every sub-expression with the pushed
down details.
we will abandon the sub-expression if we find the sub-expression is
contradictory.
when all of the sub-expressions are contradictory that mean the OR-expression
is contradictory.
A special situation is once we find a sub-expression is const true and then
the OR-expression is const true.
For example: x > 10 and (x < 9 or y > 10) in the expression we push x > 10
down into the OR-expression, then we will find the sub-expression x < 9 is
contradictory to x > 10 and x < 9 will be abandoned, so the final expression
is x > 10 AND y > 10.

10) EC situation
We may get a clause match the EC mechanism after we finished the all jobs.
For example: x > 10 and (x < 9 or y = 10) we will get the expression y = 10
match the EC, so we need add it into the EC.

11) The phase 2 examining
As we know in the case 10, if we add a ec and we may get a contradictory case
after the generate_base_implied_equalities invoked. So we need to recheck the
rel when rel's baserestrictinfo added clauses.
For example: select * from self_a x join self_a y on x.a= y.a where x.a in
(2,3,45) and x.a in (45, 78) and y.a <> 45
we will get the intersection element 45 and convert
(x.a in (2,3,45) and x.a in (45, 78)) to x.a = 45, then add it into EC.
We will get the generated clause y.a = 45 after
generate_base_implied_equalities invoked, it's contradictory to y.a <>45,
so alias y is self-contradictory finally.

Best regards

Attachment Content-Type Size
patch.diff application/octet-stream 78.6 KB
regresssion_patch.diff application/octet-stream 151.0 KB

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Masahiro Ikeda 2024-11-25 09:07:39 Re: Adding skip scan (including MDAM style range skip scan) to nbtree
Previous Message Amit Kapila 2024-11-25 08:29:05 Re: DOCS - pg_replication_slot . Fix the 'inactive_since' description