Quick Links

Re: Partition-wise join for join between (declaratively) partitioned tables

From:	Ashutosh Bapat <ashutosh(dot)bapat(at)enterprisedb(dot)com>
To:	Robert Haas <robertmhaas(at)gmail(dot)com>
Cc:	Amit Langote <Langote_Amit_f8(at)lab(dot)ntt(dot)co(dot)jp>, Etsuro Fujita <fujita(dot)etsuro(at)lab(dot)ntt(dot)co(dot)jp>, Thomas Munro <thomas(dot)munro(at)enterprisedb(dot)com>, Rafia Sabih <rafia(dot)sabih(at)enterprisedb(dot)com>, Rajkumar Raghuwanshi <rajkumar(dot)raghuwanshi(at)enterprisedb(dot)com>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: Partition-wise join for join between (declaratively) partitioned tables
Date:	2017-09-13 07:21:36
Message-ID:	CAFjFpRfA464OVY25R4eC3MZPwbmiV=1fsAqLX_rG0Wrj5_NEgQ@mail.gmail.com
Views:	Whole Thread \| Raw Message \| Download mbox \| Resend email
Thread:
Lists:	pgsql-hackers

On Wed, Sep 13, 2017 at 12:39 AM, Robert Haas <robertmhaas(at)gmail(dot)com> wrote:
> On Tue, Sep 12, 2017 at 3:46 AM, Amit Langote
> <Langote_Amit_f8(at)lab(dot)ntt(dot)co(dot)jp> wrote:
>> In this case, AcquireExecutorLocks will lock all the relations in
>> PlannedStmt.rtable, which must include all partitioned tables of all
>> partition trees involved in the query. Of those, it will lock the tables
>> whose RT indexes appear in PlannedStmt.nonleafResultRelations with
>> RowExclusiveLock mode. PlannedStmt.nonleafResultRelations is a global
>> list of all partitioned table RT indexes obtained by concatenating
>> partitioned_rels lists of all ModifyTable nodes involved in the query
>> (set_plan_refs does that). We need to distinguish nonleafResultRelations,
>> because we need to take the stronger lock on a given table before any
>> weaker one if it happens to appear in the query as a non-result relation
>> too, to avoid lock strength upgrade deadlock hazard.
>
> Hmm. The problem with this theory in my view is that it doesn't
> explain why InitPlan() and ExecOpenScanRelation() lock the relations
> instead of just assuming that they are already locked either by
> AcquireExecutorLocks or by planning. If ExecLockNonLeafAppendTables()
> doesn't really need to take locks, then ExecOpenScanRelation() must
> not need to do it either. We invented ExecLockNonLeafAppendTables()
> on the occasion of removing the scans of those tables which would
> previously have caused ExecOpenScanRelation() to be invoked, so as to
> keep the locking behavior unchanged.
>
> AcquireExecutorLocks() looks like an odd bit of code to me. The
> executor itself locks result tables in InitPlan() and then everything
> else during InitPlan() and all of the others later on while walking
> the plan tree -- comments in InitPlan() say that this is to avoid a
> lock upgrade hazard if a result rel is also a source rel. But
> AcquireExecutorLocks() has no such provision; it just locks everything
> in RTE order. In theory, that's a deadlock hazard of another kind, as
> we just talked about in the context of EIBO. In fact, expanding in
> bound order has made the situation worse: before, expansion order and
> locking order were the same, so maybe having AcquireExecutorLocks()
> work in RTE order coincidentally happened to give the same result as
> the executor code itself as long as there are no result relations.
> But this is certainly not true any more. I'm not sure it's worth
> expending a lot of time on this -- it's evidently not a problem in
> practice, or somebody probably would've complained before now.
>
> But that having been said, I don't think we should assume that all the
> locks taken from the executor are worthless because plancache.c will
> always do the job for us. I don't know of a case where we execute a
> saved plan without going through the plan cache, but that doesn't mean
> that there isn't one or that there couldn't be one in the future.
> It's not the job of these partitioning patches to whack around the way
> we do locking in general -- they should preserve the existing behavior
> as much as possible. If we want to get rid of the locking in the
> executor altogether, that's a separate discussion where, I have a
> feeling, there will prove to be better reasons for the way things are
> than we are right now supposing.
>

I agree that it's not the job of these patches to change the locking
or even get rid of partitioned_rels. In order to continue returning
partitioned_rels in Append paths esp. in the case of queries involving
set operations and partitioned table e.g "select 1 from t1 union all
select 2 from t1;" in which t1 is multi-level partitioned table, we
need a fix in add_paths_to_append_rels(). The fix provided in [1] is
correct but we will need a longer explanation of why we have to
involve RTE_SUBQUERY with RELKIND_PARTITIONED_TABLE. The explanation
is complicated. If we get rid of partitioned_rels, we don't need to
fix that code in add_paths_to_append_rel().

I suggested that [2]
-- (excerpt from [2])

Actually, the original problem that caused this discussion started
with an assertion failure in get_partitioned_child_rels() as
Assert(list_length(result) >= 1);

This assertion fails if result is NIL when an intermediate partitioned
table is passed. May be we should assert (result == NIL ||
list_length(result) == 1) and allow that function to be called even
for intermediate partitioned partitions for which the function will
return NIL. That will leave the code in add_paths_to_append_rel()
simple. Thoughts?
--

Amit Langote agrees with this. It kind of makes the assertion lame but
keeps the code sane. What do you think?

[1] https://www.postgresql.org/message-id/d2f1cdcb-ebb4-76c5-e471-79348ca5d7a7@lab.ntt.co.jp
[2] https://www.postgresql.org/message-id/CAFjFpRfJ3GRRmmOugaMA-q4i=se5P6yjZ_C6A6HDRDQQTGXy1A@mail.gmail.com
--
Best Wishes,
Ashutosh Bapat
EnterpriseDB Corporation
The Postgres Database Company

In response to

Re: Partition-wise join for join between (declaratively) partitioned tables at 2017-09-12 19:09:38 from Robert Haas

Responses

Re: Partition-wise join for join between (declaratively) partitioned tables at 2017-09-13 07:38:09 from Amit Langote
Re: Partition-wise join for join between (declaratively) partitioned tables at 2017-09-13 16:56:17 from Ashutosh Bapat

Browse pgsql-hackers by date

	From	Date	Subject
Next Message	Ashutosh Bapat	2017-09-13 07:30:17	Re: Partition-wise join for join between (declaratively) partitioned tables
Previous Message	Michael Paquier	2017-09-13 07:20:36	Re: Setting pd_lower in GIN metapage