Re: Inheritance planner CPU and memory usage change since 9.3.2

From: Dean Rasheed <dean(dot)a(dot)rasheed(at)gmail(dot)com>
To: Robert Haas <robertmhaas(at)gmail(dot)com>
Cc: Thomas Munro <thomas(dot)munro(at)enterprisedb(dot)com>, Pg Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Inheritance planner CPU and memory usage change since 9.3.2
Date: 2015-06-18 18:44:16
Message-ID: CAEZATCVePyt1Xd5tZ5sW7VGMqquuRgufvyiqO0JkpHgBrvP8jQ@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On 18 June 2015 at 14:48, Robert Haas <robertmhaas(at)gmail(dot)com> wrote:
> On Wed, Jun 17, 2015 at 9:56 AM, Robert Haas <robertmhaas(at)gmail(dot)com> wrote:
>> On Wed, Jun 17, 2015 at 9:32 AM, Thomas Munro
>> <thomas(dot)munro(at)enterprisedb(dot)com> wrote:
>>> We saw a rather extreme performance problem in a cluster upgraded from
>>> 9.1 to 9.3. It uses a largish number of child tables (partitions) and
>>> many columns. Planning a simple UPDATE via the base table started
>>> using a very large amount of memory and CPU time.
>>>
>>> My colleague Rushabh Lathia tracked the performance change down to
>>> http://git.postgresql.org/gitweb/?p=postgresql.git;a=commitdiff;h=c03ad5602f529787968fa3201b35c119bbc6d782
>>> .
>>>
>>> The call to copyObject in the loop introduced here seems to be
>>> problematic (copying append_rel_list for every element in
>>> append_rel_list unconditionally), though we're still trying to figure
>>> it out. Attached is a simple repro script, with variables to tweak.
>>>
>>> Quite a few others have posted about this sort of thing and been
>>> politely reminded of the 100 table caveat [1][2] which is fair enough,
>>> but the situations seems to have got dramatically worse for some users
>>> after an upgrade.
>>
>> Yes. The copyObject() call introduced by this commit seems to have
>> complexity O(T^2*C) where T is the number of child tables and C is the
>> number of columns per child. That's because the List that is being
>> copied is a list of AppendRelInfo nodes, each of which has a
>> translated_vars member that is a list of every Var in one table, and
>> we copy that list once per child.
>>
>> It appears that in a lot of cases this work is unnecessary. The
>> second (long) for loop in inheritance_planner copies root->rowMarks
>> and root->append_rel_list so as to be able to apply ChangeVarNodes()
>> to the result, but we only actually do that if the rte is of type
>> RTE_SUBQUERY or if it has security quals. In the common case where we
>> reach inheritance_planner not because of UNION ALL but just because
>> the table has a bunch of inheritance children (none of which have RLS
>> policies applied), we copy everything and then modify none of it,
>> using up startling large amounts of memory in ways that pre-9.2
>> versions did not.
>
> The attached patch helps. It does two things:
>
> 1. It arranges for inheritance_planner to throw away the memory
> consumed by the subroot's rowMarks and append_rel_list after the call
> to grouping_planner for that subroot returns. This prevents the
> explosive growth of memory usage in all cases I've tested so far, but
> planning is still really slow.
>
> 2. It arranges not to deep-copy append_rel_list when the root's
> append_rel_list doesn't need to be modified for the subroot. This
> makes planning much much faster in simple cases, like a simple update
> on a table with many partitions. But if you do attach a FROM clause
> containing a subquery to such an update, then this optimization
> doesn't kick in any more and things are still very slow (though still
> memory-bounded, due to part 1).
>
> I feel I might be missing a trick here. It seems unlikely to me that
> we actually need the entire append_rel_list for every subquery; and we
> almost certainly don't need to modify every element of the
> append_rel_list for every subquery. Even the ones that no
> ChangeVarNodes() call mutates still get deep-copied.
>

Yeah, you could probably pre-compute the indexes of the RTEs that need
to copied, outside of the big loop, and store them in a bitmapset.
Then, instead of copying the entire list of rowmarks/append_rel_infos
each time, you could just copy the ones that referred to those RTE
indexes (and only if the bitmapset was non-empty, which is the
equivalent of your second optimisation). However, for AppendRelInfos,
ChangeVarNodes() descends into the Vars in the translated_vars list,
so short-cutting the copying of the AppendRelInfo isn't obviously
safe. But, looking more closely, does ChangeVarNodes actually need to
examine translated_vars (the fall-through case) when child_relid isn't
the old rt_index? If not, that could be a big saving in cases like
this.

Regards,
Dean

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Tom Lane 2015-06-18 19:14:14 Re: Inheritance planner CPU and memory usage change since 9.3.2
Previous Message Merlin Moncure 2015-06-18 17:47:20 Re: [PATCH] Function to get size of asynchronous notification queue