From: | Alena Rybakina <a(dot)rybakina(at)postgrespro(dot)ru> |
---|---|
To: | Robert Haas <robertmhaas(at)gmail(dot)com> |
Cc: | Peter Geoghegan <pg(at)bowt(dot)ie>, Alexander Korotkov <aekorotkov(at)gmail(dot)com>, jian he <jian(dot)universality(at)gmail(dot)com>, pgsql-hackers(at)postgresql(dot)org, Marcos Pegoraro <marcos(at)f10(dot)com(dot)br>, teodor(at)sigaev(dot)ru, Andrei Lepikhov <a(dot)lepikhov(at)postgrespro(dot)ru>, Tomas Vondra <tomas(dot)vondra(at)enterprisedb(dot)com>, Peter Eisentraut <peter(at)eisentraut(dot)org>, Ranier Vilela <ranier(dot)vf(at)gmail(dot)com> |
Subject: | Re: POC, WIP: OR-clause support for indexes |
Date: | 2024-06-27 19:56:56 |
Message-ID: | c43ff0d4-a431-4232-8484-dcf8baac1c4e@postgrespro.ru |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
On 26.06.2024 23:19, Robert Haas wrote:
> I think maybe replying to multiple emails with a single email is
> something you'd be better off doing less often.
Ok, I won't do this in the future. After thinkingit over,I
realizedthatit turnedout to be somekindof messinthe end.
> On Tue, Jun 25, 2024 at 7:14 PM Alena Rybakina
> <a(dot)rybakina(at)postgrespro(dot)ru> wrote:
>> Sorry, you are right and I'll try to explain more precisely. The first approach is the first part of the patch, where we made "Or" expressions into an SAOP at an early stage of plan generation [0], the second one was the one proposed by A.Korotkov [1].
> [0] isn't doing anything "at an early stage of plan generation". It's
> changing something in *the parser*. The parser and planner are VERY
> different stages of query parsing, and it's really important to keep
> them separate mentally and in discussions.
Thanks for the detailed explanation, I'm always glad to learn new things
for myself)
To be honest, I had an intuitive feeling that the transformation was
called in the analyzer stage, but I wasn't sure about it, so I tried to
summarize it.
As for the fact that in general all this can be divided into two large
stages, parsing and planner is a little new to me. I have reread the
documentation [0] andI foundinformationaboutitthere.
Beforethat, Iwas guidedbyinformationfromthe
CarnegieMellonUniversitylecture andthe BruceMamjian report[1],whichwas
wrongonmypart.
By the way,it turnsout that the queryrewritingstagereferstoan
independentstage,whichis locatedbetweenthe
parserstageandtheplanner/optimizer. I found it from the documentation [2].
[0] https://www.postgresql.org/docs/current/planner-optimizer.html
[1] https://momjian.us/main/writings/pgsql/optimizer.pdf
[2] https://www.postgresql.org/docs/16/rule-system.html
> We should not be changing
> anything about the query in the parser, because that will, as
> Alexander also pointed out, change what gets stored if the user does
> something like CREATE VIEW whatever AS SELECT ...; and we should, for
> the most part, be storing the query as the user entered it, not some
> transformed version of it. Further, at the parser stage, we do not
> know the cost of anything, so we can only transform things when the
> transformed version is always - or practically always - going to be
> cheaper than the untransformed version.
Thank you, now it has become clear to me why it is so important to leave
the transformation at the planner stage.
>> On 24.06.2024 18:28, Robert Haas wrote:
>> Andrei mentioned the problem, which might be caused by the transformation on the later stage of optimization is updating references to expressions in RestrictInfo [3] lists, because they can be used in different parts during the formation of the query plan. As the practice of Self-join removal [4] has shown, this can be expensive, but feasible. By applying the transformation at the analysis stage [0], because no links were created, so we did not encounter such problems, so this approach was more suitable than the others.
> The link you provided for [3] doesn't show me exactly what code you're
> talking about, but I can see why mutating a RestrictInfo after
> creating it could be problematic. However, I'm not proposing that, and
> I don't think it's a good idea. Instead of mutating an existing data
> structure after it's been created, we want to get each data structure
> correct at the moment that it is created. What that means is that at
> each stage of processing, whenever we create a new in-memory data
> structure, we have an opportunity to make changes along the way.
>
> For instance, let's say we have a RestrictInfo and we are creating a
> Path, perhaps via create_index_path(). One argument to that function
> is a list of indexclauses. The indexclauses are derived from the
> RestrictInfo list associated with the RelOptInfo. We take some subset
> of those quals that are deemed to be indexable and we reorder them and
> maybe change a few things and we build this new list of indexclauses
> that is then passed to create_index_path(). The RelOptInfo's list of
> RestrictInfos is not changed -- only the new list of clauses derived
> from it is being built up here, without any mutation of the original
> structure.
>
> This is the kind of thing that this patch can and probably should do.
> Join removal is quite awkward, as you rightly point out, because we
> end up modifying existing data structures after they've been created,
> and that requires us to run around and fix up a bunch of stuff, and
> that can have bugs. Whenever possible, we don't want to do it that
> way. Instead, we want to pick points in the processing when we're
> anyway constructing some new structure and use that as an opportunity
> to do transformations when building the new structure that incorporate
> optimizations that make sense.
Thanks for the idea! I hadn't thought in this direction before, but it
really might just work and solve all our original problems.
By the way, I saw that the optimizer is smart enough to eliminate
duplicates. Below I have conducted a couple of examples where he decides
for himself which expression is more profitable for him to leave.
Wejustneedto addthistransformation,andthe optimizerwillchoosethe
appropriateone)
alena(at)postgres=# explain select * from x where (a = 1 or a = 2) and a in
(1,2);
QUERY PLAN
--------------------------------------------------------------------
Index Only Scan using a_idx on x (cost=0.28..8.61 rows=1 width=4)
Index Cond: (a = ANY ('{1,2}'::integer[]))
(2 rows)
alena(at)postgres=# explain select * from x where a < 3 and (a = 1 or a =
2) and a = ANY(ARRAY[1,2]);
QUERY PLAN
--------------------------------------------------------------------
Index Only Scan using a_idx on x (cost=0.28..8.60 rows=1 width=4)
Index Cond: ((a < 3) AND (a = ANY ('{1,2}'::integer[])))
(2 rows)
ItworksforKorotkov's casetoo,asIseeit:
alena(at)postgres=# create table test as (select (random()*10)::int x,
(random()*1000) y from generate_series(1,1000000) i); create index
test_x_1_y on test (y) where x = 1; create index test_x_2_y on test (y)
where x = 2; vacuum analyze test; SELECT 1000000 CREATE INDEX CREATE
INDEX VACUUM alena(at)postgres=# explain select * from test where (x = 1 or
x = 2) and y = 100 and x in (1,2); QUERY PLAN
--------------------------------------------------------------------------------------------------------------
Bitmap Heap Scan on test (cost=8.60..12.62 rows=1 width=12) Recheck
Cond: (((y = '100'::double precision) AND (x = 1)) OR ((y =
'100'::double precision) AND (x = 2))) -> BitmapOr (cost=8.60..8.60
rows=1 width=0) -> Bitmap Index Scan on test_x_1_y (cost=0.00..4.30
rows=1 width=0) Index Cond: (y = '100'::double precision) -> Bitmap
Index Scan on test_x_2_y (cost=0.00..4.30 rows=1 width=0) Index Cond: (y
= '100'::double precision) (7 rows)
I noticed that the distribute_quals_to_rels function launches at the
stage when it is necessary to generate RestrictInfo lists for relation -
it might be a suitable place for applying transformation.
So, instead of completely replacing the list, we should form a complex
BoolExpr structure with the "AND" operator, which should contain two
expressions, where one of them is BoolExpr with the "OR" operator and
the second is ScalarArrayOpExpr.
Tobe honest,I've alreadystartedwritingcodetodothis,butI'm facedwitha
misunderstandingof howto correctlycreatea
conditionfor"OR"expressionsthatare notsubjectto transformation.For
example,the expressions b=1in the query below:
alena(at)postgres=# explain select * from x where ( (a =5 or a=4) and a =
ANY(ARRAY[5,4])) or (b=1); QUERY PLAN
----------------------------------------------------------------------------------
Seq Scan on x (cost=0.00..123.00 rows=1 width=8) Filter: ((((a = 5) OR
(a = 4)) AND (a = ANY ('{5,4}'::integer[]))) OR (b = 1)) (2 rows)
I see that two expressions have remained unchanged and it only works for
"AND" binary operations.
But I think it might be worth applying this together, where does the
optimizer generate indexes (build_paths_for_OR function)?
--
Regards,
Alena Rybakina
Postgres Professional:http://www.postgrespro.com
The Russian Postgres Company
From | Date | Subject | |
---|---|---|---|
Next Message | David G. Johnston | 2024-06-27 20:00:33 | Re: Is missing LOGIN Event on Trigger Firing Matrix ? |
Previous Message | Andrew Dunstan | 2024-06-27 19:39:45 | Re: Is missing LOGIN Event on Trigger Firing Matrix ? |