From: | Alena Rybakina <lena(dot)ribackina(at)yandex(dot)ru> |
---|---|
To: | Peter Geoghegan <pg(at)bowt(dot)ie> |
Cc: | Marcos Pegoraro <marcos(at)f10(dot)com(dot)br>, Alena Rybakina <a(dot)rybakina(at)postgrespro(dot)ru>, Andrey Lepikhov <a(dot)lepikhov(at)postgrespro(dot)ru>, pgsql-hackers(at)postgresql(dot)org, teodor(at)sigaev(dot)ru, Ranier Vilela <ranier(dot)vf(at)gmail(dot)com>, Tomas Vondra <tomas(dot)vondra(at)enterprisedb(dot)com> |
Subject: | Re: POC, WIP: OR-clause support for indexes |
Date: | 2023-07-31 16:38:23 |
Message-ID: | 8d714510-af73-a908-99c8-fc14536f2669@yandex.ru |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
Hi!
>> I think it really helps to speed up the search with similar deep
>> filtering compared to cluster indexes, but do we have cases where we
>> don't use this algorithm because it takes longer than an usual index?
>> I thought about the situation with wide indexes (with a lot of multiple
>> columns) and having a lot of filtering predicates for them.
> I think that it should be possible for the optimizer to only use
> multi-column SAOP index paths when there is at least likely to be some
> small advantage -- that's definitely my goal. Importantly, we may not
> really need to accurately model the costs where the new approach turns
> out to be much faster. The only essential thing is that we avoid cases
> where the new approach is much slower than the old approach. Which is
> possible (in at least some cases) by making the runtime behavior
> adaptive.
>
> The best decision that the planner can make may be no decision at all.
> Better to wait until runtime where at all possible, since that gives
> us the latest and most accurate picture of things.
>
>> But I'm not sure about this, so it seems to me that this is a problem of
>> improper use of indexes rather.
> It's hard to say how true that is.
>
> Certainly, workloads similar to the TPC-DS benchmark kinda need
> something like MDAM. It's just not practical to have enough indexes to
> support every possible query -- the benchmark is deliberately designed
> to have unpredictable, hard-to-optimize access paths. It seems to
> require having fewer, more general indexes that can support
> multi-dimensional access reasonably efficiently.
>
> Of course, with OLTP it's much more likely that the workload will have
> predictable access patterns. That makes having exactly the right
> indexes much more practical. So maybe you're right there. But, I still
> see a lot of value in a design that is as forgiving as possible. Users
> really like that kind of thing in my experience.
I tend to agree with you, but a runtime estimate cannot give us an
accurate picture when using indexes correctly or
any other optimizations due to the unstable state of the environment in
which the query is executed.
I believe that a more complex analysis is needed here.
>>> Currently, the optimizer doesn't recognize multi-column indexes with
>>> SAOPs on every column as having a valid sort order, except on the
>>> first column. It seems possible that that has consequences for your
>>> patch. (I'm really only guessing, though; don't trust anything that I
>>> say about the optimizer too much.)
>>>
>> Honestly, I couldn't understand your concerns very well, could you
>> describe it in more detail?
> Well, I'm not sure if there is any possible scenario where the
> transformation from your patch makes it possible to go from an access
> path that has a valid sort order (usually because there is an
> underlying index scan) into an access path that doesn't. In fact, the
> opposite situation seems more likely (which is good news) --
> especially if you assume that my own patch is also present.
>
> Going from a bitmap OR (which cannot return sorted output) to a
> multi-column SAOP index scan (which now can) may have significant
> value in some specific circumstances. Most obviously, it's really
> useful when it enables us to feed tuples into a GroupAggregate without
> a separate sort step, and without a hash aggregate (that's why I see
> value in combining your patch with my own patch). You just need to be
> careful about allowing the opposite situation to take place.
>
> More generally, there is a need to think about strange second order
> effects. We want to be open to useful second order effects that make
> query execution much faster in some specific context, while avoiding
> harmful second order effects. Intuitively, I think that it should be
> possible to do this with the transformations performed by your patch.
>
> In other words, "helpful serendipity" is an important advantage, while
> "harmful anti-serendipity" is what we really want to avoid. Ideally by
> making the harmful cases impossible "by construction".
>
I noticed only one thing there: when we have unsorted array values in
SOAP, the query takes longer than
when it has a sorted array. I'll double-check it just in case and write
about the results later.
I am also testing some experience with multi-column indexes using SAOPs.
--
Regards,
Alena Rybakina
Postgres Professional
From | Date | Subject | |
---|---|---|---|
Next Message | Robert Haas | 2023-07-31 16:53:37 | Re: pgsql: Fix search_path to a safe value during maintenance operations. |
Previous Message | Peter Geoghegan | 2023-07-31 16:34:47 | Re: Optimizing nbtree ScalarArrayOp execution, allowing multi-column ordered scans, skip scan |