ERROR: corrupt MVNDistinct entry

From: Richard Guo <guofenglinux(at)gmail(dot)com>
To: PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: ERROR: corrupt MVNDistinct entry
Date: 2024-12-24 08:00:57
Message-ID: CAMbWs4-2Z4k+nFTiZe0Qbu5n8juUWenDAtMzi98bAZQtwHx0-w@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

I ran into this error in estimate_multivariate_ndistinct, and it can
be reproduced with the query below.

create table t (a int, b int);
insert into t select 1, 1;
create statistics s (ndistinct) on a, b from t;
analyze;

explain select 1 from t t1
left join (select a c1, coalesce(a) c2 from t t2) s on true
group by s.c1, s.c2;
ERROR: corrupt MVNDistinct entry

And the first bad commit is:

2489d76c4906f4461a364ca8ad7e0751ead8aa0d is the first bad commit
commit 2489d76c4906f4461a364ca8ad7e0751ead8aa0d
Author: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Date: Mon Jan 30 13:16:20 2023 -0500

Make Vars be outer-join-aware.

So in this query, there are two grouping expressions: s.c1 is Var t2.a
with nullingrels set to {3}; s.c2 is a PHV with nullingrels also being
{3}, and its contained expression is Var t2.a with empty nullingrels.

This eventually leads to estimate_num_groups creating two separate
GroupVarInfos for Var t2.a: one with nullingrels {3}, and another with
empty nullingrels. As a result, estimate_multivariate_ndistinct
incorrectly assumes there are two matching expressions. When it later
fails to find the exact match for the combination, it mistakenly
concludes that there is a corrupt MVNDistinct entry.

It seems to me that when estimating the number of groups, we do not
need to concern ourselves with the outer joins that could null the
Vars/PHVs contained in the grouping expressions, and we should not
count the same Var more than once.

So I wonder if we can fix this issue by removing the nullingrels
within the grouping expressions first in estimate_num_groups. Such as:

--- a/src/backend/utils/adt/selfuncs.c
+++ b/src/backend/utils/adt/selfuncs.c
@@ -121,6 +121,7 @@
#include "parser/parse_clause.h"
#include "parser/parse_relation.h"
#include "parser/parsetree.h"
+#include "rewrite/rewriteManip.h"
#include "statistics/statistics.h"
#include "storage/bufmgr.h"
#include "utils/acl.h"
@@ -3446,6 +3447,10 @@ estimate_num_groups(PlannerInfo *root, List
*groupExprs, double input_rows,
if (groupExprs == NIL || (pgset && *pgset == NIL))
return 1.0;

+ groupExprs = (List *) remove_nulling_relids((Node *) groupExprs,
+ root->outer_join_rels,
+ NULL);
+
/*
* Count groups derived from boolean grouping expressions. For other
* expressions, find the unique Vars used, treating an expression as a Var

Any thoughts?

Thanks
Richard

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message torikoshia 2024-12-24 08:53:06 RFC: Allow EXPLAIN to Output Page Fault Information
Previous Message Michael Paquier 2024-12-24 07:38:51 Re: Recovering from detoast-related catcache invalidations