Re: post-freeze damage control

From: Andrei Lepikhov <a(dot)lepikhov(at)postgrespro(dot)ru>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Robert Haas <robertmhaas(at)gmail(dot)com>, "pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: post-freeze damage control
Date: 2024-04-09 13:20:11
Message-ID: 0c1cfa72-4fa4-4d98-a5e5-30c92e97ce63@postgrespro.ru
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On 9/4/2024 12:55, Tom Lane wrote:
> Andrei Lepikhov <a(dot)lepikhov(at)postgrespro(dot)ru> writes:
>>> * I really, really dislike jamming this logic into prepqual.c,
>>> where it has no business being. I note that it was shoved
>>> into process_duplicate_ors without even the courtesy of
>>> expanding the header comment:
>
>> Yeah, I preferred to do it in parse_expr.c with the assumption of some
>> 'minimal' or 'canonical' tree form.
>
> That seems quite the wrong direction to me. AFAICS, the argument
> for making this transformation depends on being able to convert
> to an indexscan condition, so I would try to apply it even later,
> when we have a set of restriction conditions to apply to a particular
> baserel. (This would weaken the argument that we need hashing
> rather than naive equal() tests even further, I think.) Applying
> the transform to join quals seems unlikely to be a win.
Our first prototype did this job right at the stage of index path
creation. Unfortunately, this approach was too narrow and expensive.
The most problematic cases we encountered were from BitmapOr paths: if
an incoming query has a significant number of OR clauses, the optimizer
spends a lot of time generating these, in most cases, senseless paths
(remember also memory allocated for that purpose). Imagine how much
worse the situation becomes when we scale it with partitions.
Another issue we resolved with this transformation: shorter list of
clauses speeds up planning and, sometimes, makes cardinality estimation
more accurate.
Moreover, it helps even SeqScan: attempting to find a value in the
hashed array is much faster than cycling a long-expression on each
incoming tuple.

One more idea that I have set aside here is that the planner can utilize
quick clause hashing:
From time to time, in the mailing list, I see disputes on different
approaches to expression transformation/simplification/grouping, and
most of the time, it ends up with the problem of search complexity.
Clause hash can be a way to solve this, can't it?

--
regards,
Andrei Lepikhov
Postgres Professional

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Stefan Fercot 2024-04-09 13:34:12 Re: post-freeze damage control
Previous Message Zhijie Hou (Fujitsu) 2024-04-09 13:00:49 RE: Synchronizing slots from primary to standby