From: | Tomas Vondra <tomas(dot)vondra(at)2ndquadrant(dot)com> |
---|---|
To: | Dean Rasheed <dean(dot)a(dot)rasheed(at)gmail(dot)com> |
Cc: | Kyotaro HORIGUCHI <horiguchi(dot)kyotaro(at)lab(dot)ntt(dot)co(dot)jp>, David Rowley <david(dot)rowley(at)2ndquadrant(dot)com>, Alvaro Herrera <alvherre(at)2ndquadrant(dot)com>, Michael Paquier <michael(at)paquier(dot)xyz>, Andres Freund <andres(at)anarazel(dot)de>, Thomas Munro <thomas(dot)munro(at)enterprisedb(dot)com>, Bruce Momjian <bruce(at)momjian(dot)us>, Mark Dilger <hornschnorter(at)gmail(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org> |
Subject: | Re: [HACKERS] PATCH: multivariate histograms and MCV lists |
Date: | 2019-03-16 15:54:01 |
Message-ID: | d207e075-9fb3-3a95-7811-8e0ab5292b2a@2ndquadrant.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
On 3/16/19 11:55 AM, Dean Rasheed wrote:
> On Fri, 15 Mar 2019 at 00:06, Tomas Vondra <tomas(dot)vondra(at)2ndquadrant(dot)com> wrote:
>> I've noticed an annoying thing when modifying type of column not
>> included in a statistics...
>>
>> That is, we don't remove the statistics, but the estimate still changes.
>> But that's because the ALTER TABLE also resets reltuples/relpages:
>>
>> That's a bit unfortunate, and it kinda makes the whole effort to not
>> drop the statistics unnecessarily kinda pointless :-(
>>
>
> Well not entirely. Repeating that test with 100,000 rows, I get an
> initial estimate of 9850 (actual 10,000), which then drops to 2451
> after altering the column. But if you drop the dependency statistics,
> the estimate drops to 241, so clearly there is some benefit in keeping
> them in that case.
>
Sure. What I meant is that to correct the relpages/reltuples estimates
you need to do ANALYZE, which rebuilds the statistics anyway. Although
VACUUM also fixes the estimates, without the stats rebuild.
> Besides, I thought there was no extra effort in keeping the extended
> statistics in this case -- isn't it just using the column
> dependencies, so in this case UpdateStatisticsForTypeChange() never
> gets called anyway?
>
Yes, it does not get called at all. My point was that I was a little bit
confused because the test says "check change of unrelated column type
does not reset the MCV statistics" yet the estimates do actually change.
I wonder why we reset the relpages/reltuples to 0, instead of retaining
the original values, though. That would likely give us better density
estimates in estimate_rel_size, I think.
So I've tried doing that, and I've included it as 0001 into the patch
series. It seems to work, but I suppose the reset is there for a reason.
In any case, this is a preexisting issue, independent of what this patch
does or changes.
I've discovered another issue, though. Currently, clauselist_selectivity
has this as the very beginning:
/*
* If there's exactly one clause, just go directly to
* clause_selectivity(). None of what we might do below is relevant.
*/
if (list_length(clauses) == 1)
return clause_selectivity(root, (Node *) linitial(clauses),
varRelid, jointype, sjinfo);
Which however fails with queries like this:
WHERE (a = 1 OR b = 1)
because clauselist_selectivity sees it as a single clause, passes it to
clause_selectivity and the OR-clause handling simply relies on
(s1 + s2 - s1 * s2)
which entirely ignores the multivariate stats. The other similar places
in clause_selectivity() simply call clauselist_selectivity() so that's
OK, but OR-clauses don't do that.
For functional dependencies this is not a huge issue because those apply
only to AND-clauses. But there were proposals to maybe apply them to
other types of clauses, in which case it might become issue.
I think the best fix is moving the optimization after the multivariate
stats are applied. The only alternative I can think of is modifying
clauselist_selectivity so that it can be executed on OR-clauses. But
that seems much more complicated than the former option for almost no
other advantages.
I've also changed how statext_is_compatible_clause_internal() handles
the attnums bitmapset - you were right in your 3/10 message that we can
just pass the value, without creating a local bitmapset. So I've just
done that.
regards
--
Tomas Vondra http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
Attachment | Content-Type | Size |
---|---|---|
0001-retain-reltuples-relpages-on-ALTER-TABLE.patch.gz | application/gzip | 1.7 KB |
0002-multivariate-MCV-lists.patch.gz | application/gzip | 39.6 KB |
0003-multivariate-histograms.patch.gz | application/gzip | 48.7 KB |
From | Date | Subject | |
---|---|---|---|
Next Message | Tom Lane | 2019-03-16 16:07:55 | Unduly short fuse in RequestCheckpoint |
Previous Message | Robert Haas | 2019-03-16 14:32:16 | Re: Feature improvement: can we add queryId for pg_catalog.pg_stat_activity view? |