Quick Links

Re: [HACKERS] PATCH: multivariate histograms and MCV lists

From:	Tomas Vondra <tomas(dot)vondra(at)2ndquadrant(dot)com>
To:	Konstantin Knizhnik <k(dot)knizhnik(at)postgrespro(dot)ru>, Dean Rasheed <dean(dot)a(dot)rasheed(at)gmail(dot)com>
Cc:	Bruce Momjian <bruce(at)momjian(dot)us>, Alvaro Herrera <alvherre(at)2ndquadrant(dot)com>, Andres Freund <andres(at)anarazel(dot)de>, Thomas Munro <thomas(dot)munro(at)enterprisedb(dot)com>, Mark Dilger <hornschnorter(at)gmail(dot)com>, Adrien Nayrat <adrien(dot)nayrat(at)dalibo(dot)com>, Pg Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: [HACKERS] PATCH: multivariate histograms and MCV lists
Date:	2018-07-17 23:58:06
Message-ID:	abef0126-ff75-850f-0a25-063a8ffa07ad@2ndquadrant.com
Views:	Whole Thread \| Raw Message \| Download mbox \| Resend email
Thread:
Lists:	pgsql-hackers

On 07/18/2018 12:41 AM, Konstantin Knizhnik wrote:
> ...
>
> Teodor Sigaev has proposed an alternative approach for calculating
> selectivity of multicolumn join or compound index search.
> Usually DBA creates compound indexes which can be used by optimizer to
> build efficient query execution plan based on index search.
> We can stores statistic for compound keys of such indexes and (as it is
> done now for functional indexes) and use it to estimate selectivity
> of clauses. I have implemented this idea and will publish patch in
> separate thread soon.
> Now I just want to share some results for the Tomas examples.
>
> So for Vanilla Postges without extra statistic estimated number of rows
> is about 4 times smaller than real.
>

Can you please post plans with parallelism disabled, and perhaps without
the aggregate? Both makes reading the plans unnecessarily difficult ...

> postgres=# explain analyze SELECT count(*) FROM foo WHERE a=1 and (b=1
> or b=2);
> QUERY PLAN
>
> --------------------------------------------------------------------------------------------------------------------------------------------
>
> Aggregate (cost=10964.76..10964.77 rows=1 width=8) (actual
> time=49.251..49.251 rows=1 loops=1)
> -> Bitmap Heap Scan on foo (cost=513.60..10906.48 rows=23310
> width=0) (actual time=17.368..39.928 rows=90000 loops=1)

ok, 23k vs. 90k

>
> If we add statistic for a and b columns:
>
> create statistics ab on a,b from foo;
> analyze foo;
>
> then expected results is about 30% larger then real: 120k vs 90k:
>

Eh? The plan however says 9k vs. 30k ...

> -> Parallel Bitmap Heap Scan on foo
> (cost=2561.33..13424.24 rows=9063 width=0) (actual time=15.551..26.057
> rows=30000 loops=3)

...

> With compound index statistic estimation is almost equal to real value:
>
> -> Bitmap Heap Scan on foo (cost=1880.20..13411.66 rows=23310
> width=0) (actual time=38.776..61.050 rows=90000 loops=1)

Well, this says 23k vs. 90k.

regards

--
Tomas Vondra http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

In response to

Re: [HACKERS] PATCH: multivariate histograms and MCV lists at 2018-07-17 22:41:47 from Konstantin Knizhnik

Responses

Re: [HACKERS] PATCH: multivariate histograms and MCV lists at 2018-07-18 07:32:06 from Konstantin Knizhnik

Browse pgsql-hackers by date

	From	Date	Subject
Next Message	Tom Lane	2018-07-18 00:01:05	Re: ENOSPC FailedAssertion("!(RefCountErrors == 0)"
Previous Message	Peter Geoghegan	2018-07-17 23:39:47	Re: "Write amplification" is made worse by "getting tired" while inserting into nbtree secondary indexes (Was: Why B-Tree suffix truncation matters)