Re: BUG #18885: ERROR: corrupt MVNDistinct entry - 2

From: David Rowley <dgrowleyml(at)gmail(dot)com>
To: Tomas Vondra <tomas(at)vondra(dot)me>
Cc: tharakan(at)gmail(dot)com, pgsql-bugs(at)lists(dot)postgresql(dot)org, PG Bug reporting form <noreply(at)postgresql(dot)org>, Alexander Korotkov <aekorotkov(at)gmail(dot)com>
Subject: Re: BUG #18885: ERROR: corrupt MVNDistinct entry - 2
Date: 2025-04-13 23:26:10
Message-ID: CAApHDvocZCUhM9W9mJ39d6oQz7ePKoqFnao_347mvC-A7QatcQ@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-bugs

On Fri, 11 Apr 2025 at 01:31, Tomas Vondra <tomas(at)vondra(dot)me> wrote:
> I think estimate_multivariate_bucketsize() needs to be more careful
> about building the GroupVarInfo list - in particular, it needs to do the
> dance with examine_variable + add_unique_group_var + pull_var_clause,
> similar to estimate_num_groups() at line ~3532.

This should be documented to prevent future callers of
estimate_multivariate_ndistinct() from falling for this.

The attached aims to do this. I also couldn't resist a few other improvements.

There are a few strange goings-ons in the code itself that I didn't
adjust. For example, in the first "foreach(lc2, *varinfos)" loop after
the "if (stats)", there's a "found" variable that gets set and used
for no apparent reason. I don't see why the "found = true;" doesn't
just "continue;". The variable would only be needed if there was some
inner loop and we couldn't use "continue". I also can't make sense of
the following comment:

/*
* XXX Maybe we should allow searching the expressions even if we
* found an attribute matching the expression? That would handle
* trivial expressions like "(a)" but it seems fairly useless.
*/

Maybe it meant "matching the Var"?

The final loop to build the newlist also looks more complex than it
needs to be. The prior loop over *varinfos could have recorded the
matching GroupVarInfos in the list in a Bitmapset and that final loop
could become:

foreach(lc, *varinfos)
{
if (!bms_is_member(foreach_current_index(lc), matched_varinfos))
newlist = lappend(newlist, lfirst(lc));
}

David

Attachment Content-Type Size
v1-0001-Improve-comments-for-estimate_multivariate_ndisti.patch application/octet-stream 4.0 KB

In response to

Responses

Browse pgsql-bugs by date

  From Date Subject
Next Message PG Bug reporting form 2025-04-14 06:08:13 BUG #18894: values of JLC_COLLATE and LC_CTYPE in the database have changed from Japanese_Japan.932 to ja-jp
Previous Message Tom Lane 2025-04-13 14:13:31 Re: BUG #18893: Segfault during analyze pg_database