From: | Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> |
---|---|
To: | Amit Langote <amitlangote09(at)gmail(dot)com> |
Cc: | Amit Langote <Langote_Amit_f8(at)lab(dot)ntt(dot)co(dot)jp>, David Rowley <david(dot)rowley(at)2ndquadrant(dot)com>, Imai Yoshikazu <yoshikazu_i443(at)live(dot)jp>, "jesper(dot)pedersen(at)redhat(dot)com" <jesper(dot)pedersen(at)redhat(dot)com>, "Imai, Yoshikazu" <imai(dot)yoshikazu(at)jp(dot)fujitsu(dot)com>, Alvaro Herrera <alvherre(at)2ndquadrant(dot)com>, Robert Haas <robertmhaas(at)gmail(dot)com>, Justin Pryzby <pryzby(at)telsasoft(dot)com>, Pg Hackers <pgsql-hackers(at)postgresql(dot)org> |
Subject: | Re: speeding up planning with partitions |
Date: | 2019-03-30 15:11:27 |
Message-ID: | 21788.1553958687@sss.pgh.pa.us |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
Amit Langote <amitlangote09(at)gmail(dot)com> writes:
> On Sat, Mar 30, 2019 at 9:17 AM Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
>> What I propose we do about the GEQO problem is shown in 0001 attached
>> (which would need to be back-patched into v11).
>> ...
>> That's just dumb. What we *ought* to be doing in such degenerate
>> outer-join cases is just emitting the non-dummy side, ie
> Fwiw, I agree that we should fix join planning so that we get the
> ProjectionPath atop scan path of non-nullable relation instead of a
> full-fledged join path with dummy path on the nullable side. It seems
> to me that the "fix" would be mostly be localized to
> try_partitionwise_join() at least insofar as detecting whether we
> should generate a join or the other plan shape is concerned, right?
Well, if we're going to do something about that, I would like to see
it work for non-partition cases too, ie we're not smart about this
either:
regression=# explain select * from tenk1 left join (select 1 where false) ss(x) on unique1=x;
QUERY PLAN
-------------------------------------------------------------------
Nested Loop Left Join (cost=0.00..570.00 rows=10000 width=248)
Join Filter: (tenk1.unique1 = 1)
-> Seq Scan on tenk1 (cost=0.00..445.00 rows=10000 width=244)
-> Result (cost=0.00..0.00 rows=0 width=0)
One-Time Filter: false
(5 rows)
A general solution would presumably involve new logic in
populate_joinrel_with_paths for the case where the nullable side is
dummy. I'm not sure whether that leaves anything special to do in
try_partitionwise_join or not. Maybe it would, since that would
have a requirement to build the joinrel without any RHS input
RelOptInfo, but I don't think that's the place to begin working on
this.
> By the way, does it make sense to remove the tests whose output
> changes altogether and reintroduce them when we fix join planning?
> Especially, in partitionwise_aggregate.out, there are comments near
> changed plans which are no longer true.
Good point about the comments, but we shouldn't just remove those test
cases; they're useful to exercise the give-up-on-partitionwise-join
code paths. I'll tweak the comments.
>> 0002 attached is then the rest of the partition-planning patch;
>> it doesn't need to mess with joinrels.c at all. I've addressed
>> the other points discussed today in that, except for the business
>> about whether we want your 0003 bitmap-of-live-partitions patch.
>> I'm still inclined to think that that's not really worth it,
>> especially in view of your performance results.
> I think the performance results did prove that degradation due to
> those loops over part_rels becomes significant for very large
> partition counts. Is there a better solution than the bitmapset that
> you have in mind?
Hm, I didn't see much degradation in what you posted in
<5c83dbca-12b5-1acf-0e85-58299e464a26(at)lab(dot)ntt(dot)co(dot)jp>.
I am curious as to why there seems to be more degradation
for hash cases, as per Yoshikazu-san's results in
<0F97FA9ABBDBE54F91744A9B37151A512BAC60(at)g01jpexmbkw24>,
but whatever's accounting for the difference probably
is not that. Anyway I still believe that getting rid of
these sparse arrays would be a better answer.
Before that, though, I remain concerned that the PartitionPruneInfo
data structure the planner is transmitting to the executor is unsafe
against concurrent ATTACH PARTITION operations. The comment for
PartitionedRelPruneInfo says in so many words that it's relying on
indexes in the table's PartitionDesc; how is that not broken by
898e5e329?
regards, tom lane
From | Date | Subject | |
---|---|---|---|
Next Message | Alexander Korotkov | 2019-03-30 15:25:12 | Re: jsonpath |
Previous Message | Magnus Hagander | 2019-03-30 15:01:50 | Re: Checksum errors in pg_stat_database |