From: | David Kubečka <kubecka(dot)dav(at)gmail(dot)com> |
---|---|
To: | Tomas Vondra <tomas(dot)vondra(at)2ndquadrant(dot)com> |
Cc: | pgsql-hackers(at)postgresql(dot)org |
Subject: | Re: Weirdly pesimistic estimates in optimizer |
Date: | 2015-03-04 23:08:10 |
Message-ID: | CAGLMwk_aCL55y_9eVRA5cMHd-L--Z+C+NHjzbV9MX8oQiDpGYg@mail.gmail.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
Hi Tomas and others,
2015-03-02 21:29 GMT+01:00 Tomas Vondra <tomas(dot)vondra(at)2ndquadrant(dot)com>:
> Hi David ;-)
>
> On 2.3.2015 20:19, David Kubečka wrote:
> >
> > The question is why optimizer, or rather the cost estimator,
> > produced so much different estimates upon very small change in input.
> > Moreover it seems that the main culprit of bad estimates isn't
> > actually directly related to outer table, but rather to inner table.
> > Just compare estimates for the two index scans:
> >
> > With 'random_fk_dupl':
> > -> Index Scan using facts_fk_idx on facts (cost=0.42..5.75
> > rows=100 width=15) (actual time=0.009..0.117 rows=98 loops=100)
> > With 'random_fk_uniq':
> > -> Index Scan using facts_fk_idx on facts (cost=0.42..214.26
> > rows=100 width=15) (actual time=0.007..0.109 rows=98 loops=100)
> >
> > I have read the optimizer README file and also looked briefly at the
> > code, but this seems to be something not related to particular
> > implementation of algorithm (e.g. nested loop). Perhaps it's the way
> > how cost estimates are propagated down (or sideways? that would be
> > weird...) the query tree. But I am really not sure, since this is my
> > first time lookng at the optimizer code base. I should also add that
> > I have reproduced this behaviour for all versions of Pg from 9.2 up
> > to current devel.
>
> Interesting. I've only spent a few minutes looking at this, but it
> certainly is a bit strange that using smaller table with unique values
> results in a slower plan than using a table with duplicities (but the
> same number of unique values).
>
Yep. And I think that I have found the cause of the strangeness. I have
traced the planner code and this is how cost_index from costsize.c is
called when computing the cost of index scan on outer (facts) table, when
the inner table is with duplicate entries (random_fk_dupl):
cost_index (path=0x27f8fb8, root=0x27670e8, loop_count=9920)
The important thing here is the value of loop_count which is the (perfect
estimate of) number of rows of random_fk_dupl (I have recreated the table
so it differs slightly from previously posted version (9893)). Now the
predominant part of running cost is computed by this expression:
run_cost += max_IO_cost + csquared * (min_IO_cost - max_IO_cost);
Since csquared (indexCorrelation * indexCorrelation) is very small in this
case, the biggest term here is max_IO_cost, which is computed as follows:
max_IO_cost = (pages_fetched * spc_random_page_cost) / loop_count;
There is division by loop_count because of predicted effect of caching and
it is exactly this division which makes the run_cost for single index
lookup so low compared with the query version with random_fk_uniq. So the
main problem is that the planner calls cost_index with loop_count equal to
number of rows of inner table, *but it ignores semi-join semantics*, i.e.
it doesn't account for the number of *unique* rows in the inner table which
will actually be the number of loops.
Since this is my first time looking into the optimizer (and in fact any
postgres) code I am not yet able to locate the exact place where this
should be repaired, but I hope that in few days I will have a patch :-)
>
> Another observation is that the planner does see other plans, because
> tweaking the cost variables like this:
>
> SET random_page_cost = 2;
>
> produces this plan (which on my machine runs just a tad slower than the
> first query in your post):
>
> QUERY PLAN
> ---------------------------------------------------------------------
> HashAggregate (cost=10564.81..10688.47 rows=9893 width=15)
> Group Key: facts.fk
> -> Nested Loop (cost=5.42..10515.34 rows=9893 width=15)
> -> HashAggregate (cost=2.24..3.23 rows=99 width=4)
> Group Key: random_fk_uniq.fk
> -> Seq Scan on random_fk_uniq (cost=0.00..1.99 ...)
> -> Bitmap Heap Scan on facts (cost=3.18..105.18 rows=100 ...)
> Recheck Cond: (fk = random_fk_uniq.fk)
> -> Bitmap Index Scan on facts_fk_idx (cost=0.00..3.15 ...)
> Index Cond: (fk = random_fk_uniq.fk)
>
Yeah, this makes now perfect sense to me. From the run_cost and max_IO_cost
formulas you can see that (in this case!) random_page_cost is almost
exactly the linear factor here. So eventually the results are completely
opposite than I wished at first, i.e. instead of tweaking postgres to
produce lower costs for random_fk_uniq it will start producing higher costs
for random_fk_dupl. But I now believe that this is correct.
Regards,
-David
> I can get similar results by setting cpu_operator_cost=0.005. And
> further lowering to random_page_cost to 1.1 actually gets me this:
>
> QUERY PLAN
> ---------------------------------------------------------------------
> HashAggregate (cost=6185.41..6309.08 rows=9893 width=15)
> Group Key: facts.fk
> -> Nested Loop (cost=2.66..6135.95 rows=9893 width=15)
> -> HashAggregate (cost=2.24..3.23 rows=99 width=4)
> Group Key: random_fk_uniq.fk
> -> Seq Scan on random_fk_uniq (cost=0.00..1.99 ...)
> -> Index Scan using facts_fk_idx on facts (cost=0.42..60.95 ...)
> Index Cond: (fk = random_fk_uniq.fk)
>
> which is exactly the first plan from your post.
>
> I still don't understand why using the smaller table produces less
> efficient plan, but the estimated costs are very clost (20013 vs. 18147
> is very small difference in this context), and the estimator shoots for
> minimal average error. So it may seem 'weirdly pessimistic' but it might
> be equally optimistic for other queries.
>
> Also, keep in mind that the cost model is just a simplification of
> reality, so it can't be correct 100% of the time.
>
> The fact that this small difference in costs results in significant
> duration difference suggests that the default optimizer cost values do
> not reflect your environment. For example, you assume that this is very
> fast:
>
> NestLoop
> -> Seq Scan on A
> -> Index Scan using B_Y_IDX on B
> Index Condition: B.Y = A.X
>
> but index scans often produce a lot of random I/O, and that may be quite
> expensive if it really hits the storage. And that's what the default
> values (random_page_cost=4) is set for. But if you're on a system with
> lots of RAM, or SSD, ... that simply is not the case, and may penalize
> index scans (and prefer more sequential workloads).
>
> The other thing you might do is creating index on (fk,f), which on the
> new releases produces this:
>
> QUERY PLAN
> -------------------------------------------------------------------------
> HashAggregate (cost=783.77..907.43 rows=9893 width=15)
> Group Key: facts.fk
> -> Nested Loop (cost=2.66..734.30 rows=9893 width=15)
> -> HashAggregate (cost=2.24..3.23 rows=99 width=4)
> Group Key: random_fk_uniq.fk
> -> Seq Scan on random_fk_uniq (cost=0.00..1.99 ...)
> -> Index Only Scan using facts_2_idx on facts (cost=...)
> Index Cond: (fk = random_fk_uniq.fk)
>
> Which limits the amount of random I/O by only scanning the index, and is
> even faster than the first query.
>
> regard
>
> --
> Tomas Vondra http://www.2ndQuadrant.com/
> PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
>
>
>
> --
> Sent via pgsql-hackers mailing list (pgsql-hackers(at)postgresql(dot)org)
> To make changes to your subscription:
> http://www.postgresql.org/mailpref/pgsql-hackers
>
From | Date | Subject | |
---|---|---|---|
Next Message | Bruce Momjian | 2015-03-04 23:55:55 | Re: MD5 authentication needs help |
Previous Message | Andreas Karlsson | 2015-03-04 22:51:42 | Re: Using 128-bit integers for sum, avg and statistics aggregates |