Quick Links

Re: Use extended statistics to estimate (Var op Var) clauses

From:	Tomas Vondra <tomas(dot)vondra(at)enterprisedb(dot)com>
To:	Mark Dilger <mark(dot)dilger(at)enterprisedb(dot)com>, Dean Rasheed <dean(dot)a(dot)rasheed(at)gmail(dot)com>
Cc:	PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject:	Re: Use extended statistics to estimate (Var op Var) clauses
Date:	2021-08-11 17:38:24
Message-ID:	5caf5a49-4e3a-a46a-bf19-038878fad9dd@enterprisedb.com
Views:	Whole Thread \| Raw Message \| Download mbox \| Resend email
Thread:
Lists:	pgsql-hackers

On 8/11/21 4:51 PM, Mark Dilger wrote:
>
>
>> On Aug 11, 2021, at 5:08 AM, Dean Rasheed <dean(dot)a(dot)rasheed(at)gmail(dot)com> wrote:
>>
>> This feels like rather an artificial example though. Is there any real
>> use for this sort of clause?
>
> The test generated random combinations of clauses and then checked if
> any had consistently worse performance. These came up. I don't
> know that they represent anything real.
>
> What was not random in the tests was the data in the tables. I've
> gotten curious if these types of clauses (with columns compared
> against themselves) would still be bad for random rather than orderly
> data sets. I'll go check.... >
> testing....
>
> Wow. Randomizing the data makes the problems even more extreme. It
seems my original test set was actually playing to this patch's
strengths, not its weaknesses. I've changed the columns to double
precision and filled the columns with random() data, where column1 gets
random()^1, column2 gets random()^2, etc. So on average the larger
numbered columns will be smaller, and the mcv list will be irrelevant,
since values should not tend to repeat.
>

I tried using the same randomized data set, i.e. essentially

create statistics s (mcv) on a, b, c from t;

insert into t

select random(), pow(random(), 2), pow(random(), 3), pow(random(),4)
from generate_series(1,1000000) s(i);

create statistics s (mcv) on a, b, c from t;

But I don't see any difference compared to the estimates without
extended statistics, which is not surprising because there should be no
MCV list built. So I'm a bit puzzled about the claim that random data
make the problems more extreme. Can you explain?

regards

--
Tomas Vondra
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

In response to

Re: Use extended statistics to estimate (Var op Var) clauses at 2021-08-11 14:51:36 from Mark Dilger

Responses

Re: Use extended statistics to estimate (Var op Var) clauses at 2021-08-11 22:02:29 from Mark Dilger

Browse pgsql-hackers by date

	From	Date	Subject
Next Message	Jonathan S. Katz	2021-08-11 18:23:51	Re: 2021-08-12 release announcement draft
Previous Message	Peter Geoghegan	2021-08-11 17:35:16	Re: 2021-08-12 release announcement draft