From: | Tomas Vondra <tomas(dot)vondra(at)2ndquadrant(dot)com> |
---|---|
To: | Konstantin Knizhnik <k(dot)knizhnik(at)postgrespro(dot)ru>, Dean Rasheed <dean(dot)a(dot)rasheed(at)gmail(dot)com> |
Cc: | Bruce Momjian <bruce(at)momjian(dot)us>, Alvaro Herrera <alvherre(at)2ndquadrant(dot)com>, Andres Freund <andres(at)anarazel(dot)de>, Thomas Munro <thomas(dot)munro(at)enterprisedb(dot)com>, Mark Dilger <hornschnorter(at)gmail(dot)com>, Adrien Nayrat <adrien(dot)nayrat(at)dalibo(dot)com>, Pg Hackers <pgsql-hackers(at)postgresql(dot)org> |
Subject: | Re: [HACKERS] PATCH: multivariate histograms and MCV lists |
Date: | 2018-07-17 23:58:06 |
Message-ID: | abef0126-ff75-850f-0a25-063a8ffa07ad@2ndquadrant.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
On 07/18/2018 12:41 AM, Konstantin Knizhnik wrote:
> ...
>
> Teodor Sigaev has proposed an alternative approach for calculating
> selectivity of multicolumn join or compound index search.
> Usually DBA creates compound indexes which can be used by optimizer to
> build efficient query execution plan based on index search.
> We can stores statistic for compound keys of such indexes and (as it is
> done now for functional indexes) and use it to estimate selectivity
> of clauses. I have implemented this idea and will publish patch in
> separate thread soon.
> Now I just want to share some results for the Tomas examples.
>
> So for Vanilla Postges without extra statistic estimated number of rows
> is about 4 times smaller than real.
>
Can you please post plans with parallelism disabled, and perhaps without
the aggregate? Both makes reading the plans unnecessarily difficult ...
> postgres=# explain analyze SELECT count(*) FROM foo WHERE a=1 and (b=1
> or b=2);
> QUERY PLAN
>
> --------------------------------------------------------------------------------------------------------------------------------------------
>
> Aggregate (cost=10964.76..10964.77 rows=1 width=8) (actual
> time=49.251..49.251 rows=1 loops=1)
> -> Bitmap Heap Scan on foo (cost=513.60..10906.48 rows=23310
> width=0) (actual time=17.368..39.928 rows=90000 loops=1)
ok, 23k vs. 90k
>
> If we add statistic for a and b columns:
>
> create statistics ab on a,b from foo;
> analyze foo;
>
> then expected results is about 30% larger then real: 120k vs 90k:
>
Eh? The plan however says 9k vs. 30k ...
> -> Parallel Bitmap Heap Scan on foo
> (cost=2561.33..13424.24 rows=9063 width=0) (actual time=15.551..26.057
> rows=30000 loops=3)
...
> With compound index statistic estimation is almost equal to real value:
>
> -> Bitmap Heap Scan on foo (cost=1880.20..13411.66 rows=23310
> width=0) (actual time=38.776..61.050 rows=90000 loops=1)
Well, this says 23k vs. 90k.
regards
--
Tomas Vondra http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
From | Date | Subject | |
---|---|---|---|
Next Message | Tom Lane | 2018-07-18 00:01:05 | Re: ENOSPC FailedAssertion("!(RefCountErrors == 0)" |
Previous Message | Peter Geoghegan | 2018-07-17 23:39:47 | Re: "Write amplification" is made worse by "getting tired" while inserting into nbtree secondary indexes (Was: Why B-Tree suffix truncation matters) |