From: | Stefan Kaltenbrunner <stefan(at)kaltenbrunner(dot)cc> |
---|---|
To: | Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> |
Cc: | PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org> |
Subject: | Re: -HEAD planner issue wrt hash_joins on dbt3 ? |
Date: | 2006-09-13 07:35:30 |
Message-ID: | 4507B4C2.9060208@kaltenbrunner.cc |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
Tom Lane wrote:
> Stefan Kaltenbrunner <stefan(at)kaltenbrunner(dot)cc> writes:
>> Tom Lane wrote:
>>> Could we see the actual EXPLAIN ANALYZE results for the slow plan?
>
>> http://www.kaltenbrunner.cc/files/dbt3_explain_analyze.txt
>
> Well, indeed it seems that the hash join is just an innocent bystander:
> the bulk of the runtime (all but about 120 sec in fact) is spent here:
>
> -> Nested Loop (cost=13.65..1719683.85 rows=12000672 width=49) (actual time=60.325..24923860.713 rows=11897899 loops=1)
> -> Merge Join (cost=0.00..10248.66 rows=20000 width=41) (actual time=16.654..2578.060 rows=19837 loops=1)
> ...
> -> Bitmap Heap Scan on lineitem (cost=13.65..77.16 rows=665 width=16) (actual time=13.492..1254.535 rows=600 loops=19837)
> Recheck Cond: (lineitem.l_suppkey = supplier.s_suppkey)
> -> Bitmap Index Scan on i_l_suppkey (cost=0.00..13.65 rows=665 width=0) (actual time=10.662..10.662 rows=600 loops=19837)
> Index Cond: (lineitem.l_suppkey = supplier.s_suppkey)
>
> I suppose that the profile result you showed was taken during the
> startup transient where it was computing the hashtables that this loop's
> results are joined to ... but that's not where the problem is. The
> problem is repeating that bitmap scan on lineitem for nearly 20000
> different l_suppkeys.
possible - I actually took them over a longer period of time
>
> Apparently we've made the planner a bit too optimistic about the savings
> that can be expected from repeated indexscans occurring on the inside of
> a join. The other plan uses a different join order and doesn't try to
> join lineitem until it's got orders.o_orderkey, whereupon it does a
> mergejoin against an indexscan on lineitem:
>
> -> Index Scan using i_l_orderkey on lineitem (cost=0.00..2715943.34 rows=60003360 width=16) (actual time=32.868..123668.380 rows=59991868 loops=1)
>
> The runtimes for the remainders of the plans are roughly comparable, so
> it's the cost of joining lineitem that is hurting here.
>
> Is lineitem sorted (or nearly sorted) by l_orderkey? Part of the
> problem could be overestimating the cost of this indexscan.
>
> What are the physical sizes of lineitem and its indexes, and how do
> those compare to your RAM? What are you using for planner settings
> (particularly effective_cache_size)?
ouch - you are right(as usual) here.
effective_cache_size was set to 10GB(my fault for copying over the conf
from a 16GB box) during the run - lowering it just a few megabytes(!) or
to a more realistic 6GB results in the following MUCH better plan:
http://www.kaltenbrunner.cc/files/dbt3_explain_analyze2.txt
as for the relation sizes:
dbt3=# select pg_relation_size('lineitem');
pg_relation_size
------------------
10832764928
(1 row)
dbt3=# select pg_total_relation_size('lineitem');
pg_total_relation_size
------------------------
22960259072
(1 row)
there are nine btree indexes on lineitem all between 1,1GB and 1,4GB in
size.
Stefan
From | Date | Subject | |
---|---|---|---|
Next Message | Peter Eisentraut | 2006-09-13 08:04:53 | Re: Optimizer improvements: to do or not to do? |
Previous Message | Gregory Stark | 2006-09-13 06:24:32 | Re: Simplifying "standby mode" |