From: | Alexander Korotkov <aekorotkov(at)gmail(dot)com> |
---|---|
To: | Andrei Lepikhov <lepihov(at)gmail(dot)com> |
Cc: | Alena Rybakina <a(dot)rybakina(at)postgrespro(dot)ru>, Ranier Vilela <ranier(dot)vf(at)gmail(dot)com>, Alexander Lakhin <exclusion(at)gmail(dot)com>, jian he <jian(dot)universality(at)gmail(dot)com>, Robert Haas <robertmhaas(at)gmail(dot)com>, Nikolay Shaplov <dhyan(at)nataraj(dot)su>, pgsql-hackers(at)lists(dot)postgresql(dot)org, pgsql-hackers(at)postgresql(dot)org, Peter Geoghegan <pg(at)bowt(dot)ie>, Marcos Pegoraro <marcos(at)f10(dot)com(dot)br>, teodor(at)sigaev(dot)ru, Peter Eisentraut <peter(at)eisentraut(dot)org> |
Subject: | Re: POC, WIP: OR-clause support for indexes |
Date: | 2025-01-27 09:50:09 |
Message-ID: | CAPpHfduOn0htAXp6h3gcv1hMUuoQSYfM-sk6TXKcM57C8B4Umw@mail.gmail.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
Hi, Andrei!
On Mon, Jan 27, 2025 at 10:52 AM Andrei Lepikhov <lepihov(at)gmail(dot)com> wrote:
> On 1/25/25 12:04, Alexander Korotkov wrote:
> > On Wed, Jan 15, 2025 at 10:24 AM Andrei Lepikhov <lepihov(at)gmail(dot)com>
wrote:
> >> causes SEGFAULT during index keys evaluation. I haven't dived into it
> >> yet, but it seems quite a typical misstep and is not difficult to fix.
> >
> > Segfault appears to be caused by a typo. Patch used parent rinfo
> > instead of child rinfo. Fixed in the attached patch.
> Great!
> >
> > It appears that your first query also changed a plan after fixing
> > this. Could you, please, provide another example of a regression for
> > short-circuit optimization, which is related to this patch?
> Yes, it may be caused by the current lazy InitPlan evaluation strategy,
> which would only happen if it was really needed.
>
> Examples:
> ---------
>
> EXPLAIN (ANALYZE, COSTS OFF, BUFFERS OFF, TIMING OFF)
> SELECT * FROM bitmap_split_or t1
> WHERE t1.a=2 AND (t1.b=2 OR t1.b = (
> SELECT avg(x) FROM generate_series(1,1e6) AS x)::integer);
>
> without optimisation:
>
> Index Scan using t_a_b_idx on bitmap_split_or t1 (actual rows=1 loops=1)
> Index Cond: (a = 2)
> Filter: ((b = 2) OR (b = ((InitPlan 1).col1)::integer))
> InitPlan 1
> -> Aggregate (never executed)
> -> Function Scan on generate_series x (never executed)
> Planning Time: 0.564 ms
> Execution Time: 0.182 ms
>
> But having it as a part of an array, we forcedly evaluate it for (not
> 100% sure) more precise selectivity estimation:
>
> Index Scan using t_a_b_idx on bitmap_split_or t1
> (actual rows=1 loops=1)
> Index Cond: ((a = 2) AND
> (b = ANY (ARRAY[2, ((InitPlan 1).col1)::integer])))
> InitPlan 1
> -> Aggregate (actual rows=1 loops=1)
> -> Function Scan on generate_series x
> (actual rows=1000000 loops=1)
> Planning Time: 0.927 ms
> Execution Time: 489.933 ms
>
> This also means that if, before the patch, we executed a query
> successfully, after applying the patch, we sometimes may get the error:
> 'ERROR: more than one row returned by a subquery used as an expression'
> because of early InitPlan evaluation. See the example below:
>
> EXPLAIN (ANALYZE, COSTS OFF)
> SELECT * FROM bitmap_split_or t1
> WHERE t1.a=2 AND (t1.b=2 OR t1.b = (
> SELECT random() FROM generate_series(1,1e6) AS x)::integer);
>
> Index Scan using t_a_b_idx on bitmap_split_or t1
> Index Cond: ((a = 2) AND (b = ANY (ARRAY[2, ((InitPlan
> 1).col1)::integer])))
> InitPlan 1
> -> Function Scan on generate_series x
>
> I think optimisation should have never happened and this is another
> issue, isn't it?
Thank you for your examples. The reason why these example works only with
the patch is that you apply the cast outside of subquery. This is
because d4378c0005 requires OR argument to be either Cost or Param, but not
a cast over the param. Consider this example on master.
# EXPLAIN (ANALYZE, COSTS OFF, BUFFERS OFF, TIMING OFF)
SELECT * FROM bitmap_split_or t1
WHERE t1.a=2 AND (t1.b=2 OR t1.b = (
SELECT avg(x)::integer FROM generate_series(1,1e6) AS x));
QUERY PLAN
--------------------------------------------------------------------------------
Index Scan using t_a_b_idx on bitmap_split_or t1 (actual rows=1 loops=1)
Index Cond: ((a = 2) AND (b = ANY (ARRAY[2, (InitPlan 1).col1])))
InitPlan 1
-> Aggregate (actual rows=1 loops=1)
-> Function Scan on generate_series x (actual rows=1000000
loops=1)
Planning Time: 0.731 ms
Execution Time: 577.953 ms
(7 rows)
I expressed my point on this in [1]. We generally greedy about index quals
and there is no logic which prevent us from using a clause and index qual
because of its cost. And there are many cases when this causes regressions
before d4378c0005. One of examples from [1].
# explain analyze select * from t where i = 0 and j = (select slowfunc());
QUERY PLAN
---------------------------------------------------------------------------------------------------
Seq Scan on t (cost=25000.01..25195.01 rows=1 width=8) (actual
time=0.806..0.807 rows=0 loops=1)
Filter: ((i = 0) AND (j = (InitPlan 1).col1))
Rows Removed by Filter: 10000
InitPlan 1
-> Result (cost=0.00..25000.01 rows=1 width=4) (never executed)
Planning Time: 0.165 ms
Execution Time: 0.843 ms
(7 rows)
------
Regards,
Alexander Korotkov
Supabase
From | Date | Subject | |
---|---|---|---|
Next Message | Laurenz Albe | 2025-01-27 09:55:34 | Re: Disabling vacuum truncate for autovacuum |
Previous Message | Ashutosh Bapat | 2025-01-27 09:34:55 | Re: Test to dump and restore objects left behind by regression |