RE: Query with straightforward plan changes and becomes 6520 times slower after VACUUM ANALYZE

From: Jan Kort <jan(dot)kort(at)genetics(dot)nl>
To: David Rowley <dgrowleyml(at)gmail(dot)com>
Cc: "pgsql-bugs(at)lists(dot)postgresql(dot)org" <pgsql-bugs(at)lists(dot)postgresql(dot)org>
Subject: RE: Query with straightforward plan changes and becomes 6520 times slower after VACUUM ANALYZE
Date: 2021-05-19 10:55:20
Message-ID: AM0PR0502MB3620FB9E1A9B76AD313A1DBF9B2B9@AM0PR0502MB3620.eurprd05.prod.outlook.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-bugs

Hi David,

Ok so the problem is that the MCV list is created, because there are groups of common values, but then this has no effect, because there are not enough single values to create a histogram?

That is not such a rare case as I would have hoped, for example this would then also cause the same behavior:

values(1,1),(2,1),(3,1000000),(4,2),(5,2),(6,3),(7,3);

And it seems to do.

Ranges can then be longer too:

values(1,1),(2,1),(4,1),(3,1000000);

for some reason I can't reverse the ranges:

values(1,1000000),(2,1000000),(3,1);

I don’t understand why that doesn't create a bad plan, there seems to be order involved too.

I admit that this is still somewhat rare, but much less rare than I was hoping for.

A few more things:

Using text as primary key seems to perform good too, even if it keeps the same "merge" plan:

create table million (id text primary key); create table three (id text primary key, million_id text not null);

What the high value is and how close it is to the low value seems to matter:

values(1,1),(2,1),(3,100000);
values(1,1),(2,1),(3,10000);
values(1,1),(2,1),(3,1000);

Gets progressively less bad, again the plan is not changed, but the execution of it becomes more efficient.

It seems like it goes through the records from 1 till say 10000 one by one.

Regards,

Jan

-----Oorspronkelijk bericht-----
Van: David Rowley <dgrowleyml(at)gmail(dot)com>
Verzonden: woensdag 19 mei 2021 09:52
Aan: Jan Kort <jan(dot)kort(at)genetics(dot)nl>
CC: pgsql-bugs(at)lists(dot)postgresql(dot)org
Onderwerp: Re: Query with straightforward plan changes and becomes 6520 times slower after VACUUM ANALYZE

On Wed, 19 May 2021 at 14:13, David Rowley <dgrowleyml(at)gmail(dot)com> wrote:
> That makes me think the best fix would be to do something better
> during ANALYZE and maybe try and include some more upper bound MCVs.
> I'm not yet too sure what drawbacks there might be from doing that.

I had a quick look at the ANALYZE code and see that we only consider items for the MCV list when they appear more than once in the analyzed set. When it comes to the histogram, we only consider making the histogram if there are at least 2 items that are not covered in the MCV list. The comment mentions:

/*
* Generate a histogram slot entry if there are at least two distinct
* values not accounted for in the MCV list. (This ensures the
* histogram won't collapse to empty or a singleton.) */

So given we only have 2 distinct values in the "three" table and one of those is tracked in the MCV list, there are not enough values remaining to build a histogram.

It seems you've hit about the worst-case here. If you'd had 1 more value in the gfo_zaken_kosten table then that would have been enough to build a histogram. Or if the 98 value was not duplicated then we'd not have built an MCV list and built a histogram with the two values instead.

If you want a workaround, you could do:

alter table trial.gfo_zaken_kosten alter column gfo_zaken_id set statistics 0; delete from pg_statistic where starelid='trial.gfo_zaken_kosten'::regclass and staattnum=2; analyze trial.gfo_zaken_kosten;

that's a bit dirty though. You'd need to do:

alter table trial.gfo_zaken_kosten alter column gfo_zaken_id set statistics -1; analyze trial.gfo_zaken_kosten;

if that table was to ever change, else you might get bad plans due to lack of stats.

David

In response to

Responses

Browse pgsql-bugs by date

  From Date Subject
Next Message Daniel Gustafsson 2021-05-19 11:17:36 Re: BUG #17024: ERROR: column c.relhasoids does not exist at character 245
Previous Message PG Bug reporting form 2021-05-19 10:48:53 BUG #17024: ERROR: column c.relhasoids does not exist at character 245