Re: [HACKERS] Runtime Partition Pruning

From: Andy Fan <zhihui(dot)fan1213(at)gmail(dot)com>
To: Amit Langote <amitlangote09(at)gmail(dot)com>
Cc: Ashutosh Bapat <ashutosh(dot)bapat(dot)oss(at)gmail(dot)com>, Robert Haas <robertmhaas(at)gmail(dot)com>, Beena Emerson <memissemerson(at)gmail(dot)com>, Amit Langote <Langote_Amit_f8(at)lab(dot)ntt(dot)co(dot)jp>, amul sul <sulamul(at)gmail(dot)com>, "pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org>, Dilip Kumar <dilipbalaut(at)gmail(dot)com>, David Rowley <dgrowleyml(at)gmail(dot)com>
Subject: Re: [HACKERS] Runtime Partition Pruning
Date: 2020-10-14 00:40:13
Message-ID: CAKU4AWqGNCAt4rGb7ZKKzPgfRZfp4DfO8fNicXWzKi9YdnBcaA@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Tue, Oct 13, 2020 at 3:48 PM Amit Langote <amitlangote09(at)gmail(dot)com>
wrote:

> On Thu, Oct 8, 2020 at 8:25 PM Ashutosh Bapat
> <ashutosh(dot)bapat(dot)oss(at)gmail(dot)com> wrote:
> > On Wed, Oct 7, 2020 at 7:00 PM Andy Fan <zhihui(dot)fan1213(at)gmail(dot)com>
> wrote:
> > > I can understand Robert's idea now, he intended to resolve both the
> > > "initial-partition-prune" case and "runtime partition prune" case
> while I always think
> > > about the former case (Amit reminded me about that at the beginning,
> but I just
> > > wake up right now..)
> > >
> > > With the Robert's idea, I think we can do some hack at
> create_append_path,
> > > we can figure out the pruneinfo (it was done at create_append_plan
> now) at that
> > > stage, and then did a fix_append_path with such pruneinfo. We need to
> fix not
> > > only the cost of AppendPath, but also rows of AppendPath, For example:
> > > SELECT .. FROM t1, t2, p where t1.a = p.partkey and t1.b = t2.b, if the
> > > plan is:
> > > Nest Loop
> > > Nest Loop
> > > t1
> > > Append (p)
> > > t2
> > >
> > > Then the rows of Append (p) will be very important.
> > >
> > > For this idea, how to estimate how many partitions/rows can be pruned
> is a key
> > > part. Robert said "I was thinking, rather, that if we know for
> example that
> > > we've doing pruning on partition_column = $1, then we know that only
> > > one partition will match. That's probably a common case. If we've
> > > got partition_column > $1, we could assume that, say, 75% of the
> > > partitions would match. partition_column BETWEEN $1 and $2 is
> > > probably a bit more selective, so maybe we assume 50% of the
> > > partitions would match.". I think we can't say the 75% or 50% is a
> good
> > > number, but the keypoint may be "partition_column = $1" is a common
> > > case. And for the others case, we probably don't make it worse.
> > >
> > > I think we need to do something here, any thoughts? Personally I'm more
> > > like this idea now.
> >
> > Yes, I think we have to do something on those lines. But I am
> > wondering why our stats machinery is failing to do that already. For
> > equality I think it's straight forward. But even for other operators
> > we should use the statistics from the partitioned table to estimate
> > the selectivity and then adjust number of scanned partitions = (number
> > of partitions * fraction of rows scanned). That might be better than
> > 50% or 75%.
>
> This is an interesting idea, that is, the idea of consulting parent
> relation's stats to guess at the number of partitions that might be
> scanned.
>
> However, we don't currently consult an inheritance parent relation's
> stats at all during "base" relation planning, which is why you don't
> see the parent relation's statistics reflected in the row count and
> costs assigned to its (Append at al) paths. The hard-coded rule is to
> sum up children's rows and their paths' costs; see cost_append().
>
> My thinking on this was that we'd just extend that hard-coded rule to
> take into account that run-time pruning, if applicable in a given
> plan, will cause some of those child paths to be discarded. Maybe
> Andy is headed in that direction?
>
>
Yes, that's what I am trying to do. The minimum code in my mind is:

create_append_path(...)
{

double run_time_prune_est;

if (enable_partition_prune && .. )
List * partition_prune_step_ops = cal_prune_step_ops(rel,
partitioned_rels);
run_time_prune_est =
estimate_runtime_prune_ratio(partition_prune_step_ops);
}
// adjust the rows/costs of AppendPath based on the run_time_prune_est.

The difference between '=' and '<' will be handled in function
estimate_runtime_prune_ratio. Since I still don't make my mind runnable
now,
some data structures may change, but the overall idea is something like
above.

--
Best Regards
Andy Fan

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Kyotaro Horiguchi 2020-10-14 01:15:58 Re: Transactions involving multiple postgres foreign servers, take 2
Previous Message Kyotaro Horiguchi 2020-10-14 00:06:28 Re: archive status ".ready" files may be created too early