From: | Kohei KaiGai <kaigai(at)kaigai(dot)gr(dot)jp> |
---|---|
To: | Simon Riggs <simon(at)2ndquadrant(dot)com> |
Cc: | Tomas Vondra <tomas(dot)vondra(at)2ndquadrant(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org> |
Subject: | Re: DBT-3 with SF=20 got failed |
Date: | 2015-08-19 11:55:40 |
Message-ID: | CADyhKSWnRTvVzpgFXDrQ5sgvY4E+kWor7Qfj4qCt2VrFDEs+0A@mail.gmail.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
2015-08-19 20:12 GMT+09:00 Simon Riggs <simon(at)2ndquadrant(dot)com>:
> On 12 June 2015 at 00:29, Tomas Vondra <tomas(dot)vondra(at)2ndquadrant(dot)com> wrote:
>
>>
>> I see two ways to fix this:
>>
>> (1) enforce the 1GB limit (probably better for back-patching, if that's
>> necessary)
>>
>> (2) make it work with hash tables over 1GB
>>
>> I'm in favor of (2) if there's a good way to do that. It seems a bit
>> stupid not to be able to use fast hash table because there's some artificial
>> limit. Are there any fundamental reasons not to use the
>> MemoryContextAllocHuge fix, proposed by KaiGai-san?
>
>
> If there are no objections, I will apply the patch for 2) to HEAD and
> backpatch to 9.5.
>
Please don't be rush. :-)
It is not difficult to replace palloc() by palloc_huge(), however, it may lead
another problem once planner gives us a crazy estimation.
Below is my comment on the another thread.
==========
Also, we may need to pay attention to reliability of scale estimation
by planner.
Even though the plan below says that Join generates 60521928028 rows,
it actually generates 776157676 rows (0.12%).
tpcds100=# EXPLAIN ANALYZE select
ws1.ws_order_number,ws1.ws_warehouse_sk wh1,ws2.ws_warehouse_sk wh2
from web_sales ws1,web_sales ws2
where ws1.ws_order_number = ws2.ws_order_number
and ws1.ws_warehouse_sk <> ws2.ws_warehouse_sk;
QUERY PLAN
--------------------------------------------------------------------------------------------------------------------------------------------
Merge Join (cost=25374644.08..1160509591.61 rows=60521928028
width=24) (actual time=138347.979..491889.343 rows=776157676 loops=1)
Merge Cond: (ws1.ws_order_number = ws2.ws_order_number)
Join Filter: (ws1.ws_warehouse_sk <> ws2.ws_warehouse_sk)
Rows Removed by Join Filter: 127853313
-> Sort (cost=12687322.04..12867325.16 rows=72001248 width=16)
(actual time=73252.300..79017.420 rows=72001237 loops=1)
Sort Key: ws1.ws_order_number
Sort Method: quicksort Memory: 7083296kB
-> Seq Scan on web_sales ws1 (cost=0.00..3290612.48
rows=72001248 width=16) (actual time=0.023..39951.201 rows=72001237
loops=1)
-> Sort (cost=12687322.04..12867325.16 rows=72001248 width=16)
(actual time=65095.655..128885.811 rows=904010978 loops=1)
Sort Key: ws2.ws_order_number
Sort Method: quicksort Memory: 7083296kB
-> Seq Scan on web_sales ws2 (cost=0.00..3290612.48
rows=72001248 width=16) (actual time=0.014..31046.888 rows=72001237
loops=1)
Planning time: 0.232 ms
Execution time: 530176.521 ms
(14 rows)
So, even if we allows nodeHash.c to allocate hash buckets larger than
1GB, its initial size may be determined carefully.
Probably, 1GB is a good starting point even if expanded later.
--
KaiGai Kohei <kaigai(at)kaigai(dot)gr(dot)jp>
From | Date | Subject | |
---|---|---|---|
Next Message | Simon Riggs | 2015-08-19 11:59:44 | Re: Declarative partitioning |
Previous Message | Simon Riggs | 2015-08-19 11:12:34 | Re: DBT-3 with SF=20 got failed |