Re: Small performance tweak to run-time partition pruning

From: David Rowley <david(dot)rowley(at)2ndquadrant(dot)com>
To: "Imai, Yoshikazu" <imai(dot)yoshikazu(at)jp(dot)fujitsu(dot)com>
Cc: PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject: Re: Small performance tweak to run-time partition pruning
Date: 2018-10-09 02:02:28
Message-ID: CAKJS1f-Ghz+32SpyRD7Gz1ZG-5NPrk9xzCt0+cMX99SYL1Sfwg@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On 9 October 2018 at 14:23, Imai, Yoshikazu
<imai(dot)yoshikazu(at)jp(dot)fujitsu(dot)com> wrote:
> I checked codes and I think so too.
>
> Confirmation of my understanding and For more information to others:
>
> The subplan map is used when we call ExecFindInitialMatchingSubPlans or
> ExecFindMatchingSubPlans.
> ExecFindInitialMatchingSubPlans is called by ExecInit(Merge)Append.
> ExecFindmatchingSubPlans is called by below fours which is executed after
> ExecInit(Merge)Append and it is called when the
> as_valid_subplans(or ms_valid_subplans) is not NULL.

Thanks for looking at this.

Yeah, so subplan_map is just an array that stores the List index of
the Append or MergeAppend's subplans. When we perform run-time pruning
during executor initialisation, if we prune away some of these
subplans then we don't initialise those pruned subplans at all which
results in missing executor nodes for those plans. Instead of
maintaining an array to store these with a bunch of NULLs in them to
represent the pruned subnodes, for performance reasons, we make a
gapless array and store them in there. This means that for the
run-time pruning that we could do running actual execution
(ExecFindMatchingSubPlans), the old subplan_map would be out of date,
as it would contain the indexes of the subplans as if we'd not pruned
any. We can simply not bother adjusting the subplan_map if no further
pruning is required. This could leave the map pointing to subplans
that don't exist, but we only need to care about that when the map is
actually going to be used for something. The good news is, we know in
advance if the map will be used again.

> I saw the patch and it seems good to me about the codes.
> I still couldn't check additional test cases in the patch.
>
>
> BTW, when I looking the codes, I thought there is further improvements in
> ExecInitAppend which is described below.
>
> j = i = 0;
> firstvalid = nplans;
> foreach(lc, node->appendplans)
> {
> if (bms_is_member(i, validsubplans))
> {
> Plan *initNode = (Plan *) lfirst(lc);
>
> /*
> * Record the lowest appendplans index which is a valid partial
> * plan.
> */
> if (i >= node->first_partial_plan && j < firstvalid)
> firstvalid = j;
>
> appendplanstates[j++] = ExecInitNode(initNode, estate, eflags);
> }
> i++;
> }
>
> If number of valid subplans is few compared to node->appendplans, it is a waste to check
> bms_is_member(i, validsubplans) for all node->appendplans.
> Of course, node->appendplans is List struct and we have to loop it to find valid plan,
> that we can't simply use while(bms_next_member(i, validsubplans)) loop.
> I don't have any good idea for it now, but can we improve it?

I do have other ideas for that but not ready to share code yet, but it
basically requires reimplementing List to use arrays as their
underlying implementation. This will allow the bms_next_member() type
loop and list_nth() will be O(1) instead of O(N).

It might be possible to squeeze a bit more performance out of this
code with the current List implementation, but I it might actually
slow performance in some cases, for example, if the only surviving
partition was one of the last ones in the List. Getting the final
element with list_nth() is optimized, but if you consider a
time-based, say monthly, RANGE partition, a DBA might maintain a
partition ready for the next month of data, so it might be very common
to access the 2nd last element in the list for all queries looking at
"this months" data. In that case, list_nth(), in its current form, is
as slow as can be. You'd also need to do a bms_num_members() or
bms_get_singleton_member(), in order to decide if the alternative
method is going to be any faster. That call is not going to be free.

It might also be possible to form the loop so that it calls
bms_next_member() then store the result and loop until we reach that
number. That would only save the bms_is_member call per loop, but I'd
much rather do the array idea as it should also speed up plenty of
other things.

--
David Rowley http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Iwata, Aya 2018-10-09 02:14:52 RE: Function for listing archive_status directory
Previous Message Tom Lane 2018-10-09 01:55:34 Re: pread() and pwrite()