| From: | Robert Haas <robertmhaas(at)gmail(dot)com> | 
|---|---|
| To: | Amit Langote <amitlangote09(at)gmail(dot)com> | 
| Cc: | Amit Langote <Langote_Amit_f8(at)lab(dot)ntt(dot)co(dot)jp>, Rajkumar Raghuwanshi <rajkumar(dot)raghuwanshi(at)enterprisedb(dot)com>, Ashutosh Bapat <ashutosh(dot)bapat(at)enterprisedb(dot)com>, Pg Hackers <pgsql-hackers(at)postgresql(dot)org> | 
| Subject: | Re: Declarative partitioning - another take | 
| Date: | 2016-12-07 04:38:03 | 
| Message-ID: | CA+Tgmob9o3+xgYHtGbfzUWm9bknRryx53Q5pEVFkzVHSDO=QXQ@mail.gmail.com | 
| Views: | Whole Thread | Raw Message | Download mbox | Resend email | 
| Thread: | |
| Lists: | pgsql-hackers | 
On Wed, Nov 30, 2016 at 10:56 AM, Amit Langote <amitlangote09(at)gmail(dot)com> wrote:
> The latest patch I posted earlier today has this implementation.
I decided to try out these patches today with #define
CLOBBER_CACHE_ALWAYS 1 in pg_config_manual.h, which found a couple of
problems:
1. RelationClearRelation() wasn't preserving the rd_partkey, even
though there's plenty of code that relies on it not changing while we
hold a lock on the relation - in particular, transformPartitionBound.
2. partition_bounds_equal() was using the comparator and collation for
partitioning column 0 to compare the datums for all partitioning
columns.  It's amazing this passed the regression tests.
The attached incremental patch fixes those things and some cosmetic
issues I found along the way.
3. RelationGetPartitionDispatchInfo() is badly broken:
1010         pd[i] = (PartitionDispatch) palloc(sizeof(PartitionDispatchData));
1011         pd[i]->relid = RelationGetRelid(partrel);
1012         pd[i]->key = RelationGetPartitionKey(partrel);
1013         pd[i]->keystate = NIL;
1014         pd[i]->partdesc = partdesc;
1015         pd[i]->indexes = (int *) palloc(partdesc->nparts * sizeof(int));
1016         heap_close(partrel, NoLock);
1017
1018         m = 0;
1019         for (j = 0; j < partdesc->nparts; j++)
1020         {
1021             Oid     partrelid = partdesc->oids[j];
This code imagines that pointers it extracted from partrel are certain
to remain valid after heap_close(partrel, NoLock), perhaps on the
strength of the fact that we still retain a lock on the relation.  But
this isn't the case.  As soon as nobody has the relation open, a call
to RelationClearRelation() will destroy the relcache entry and
everything to which it points; with CLOBBER_CACHE_ALWAYS, I see a
failure at line 1021:
#0  RelationGetPartitionDispatchInfo (rel=0x1136dddf8, lockmode=3,
leaf_part_oids=0x7fff5633b938) at partition.c:1021
1021                Oid        partrelid = partdesc->oids[j];
(gdb) bt 5
#0  RelationGetPartitionDispatchInfo (rel=0x1136dddf8, lockmode=3,
leaf_part_oids=0x7fff5633b938) at partition.c:1021
#1  0x0000000109b8d71f in ExecInitModifyTable (node=0x7fd12984d750,
estate=0x7fd12b885438, eflags=0) at nodeModifyTable.c:1730
#2  0x0000000109b5e7ac in ExecInitNode (node=0x7fd12984d750,
estate=0x7fd12b885438, eflags=0) at execProcnode.c:159
#3  0x0000000109b58548 in InitPlan (queryDesc=0x7fd12b87b638,
eflags=0) at execMain.c:961
#4  0x0000000109b57dcd in standard_ExecutorStart
(queryDesc=0x7fd12b87b638, eflags=0) at execMain.c:239
(More stack frames follow...)
Current language:  auto; currently minimal
(gdb) p debug_query_string
$1 = 0x7fd12b84c238 "insert into list_parted values (null, 1);"
(gdb) p partdesc[0]
$2 = {
  nparts = 2139062143,
  oids = 0x7f7f7f7f7f7f7f7f,
  boundinfo = 0x7f7f7f7f7f7f7f7f
}
As you can see, the partdesc is no longer valid here.  I'm not
immediately sure how to fix this; this isn't a simple thinko.  You
need to keep the relations open for the whole duration of the query,
not just long enough to build the dispatch info.  I think you should
try to revise this so that each relation is opened once and kept open;
maybe the first loop should be making a pointer-list of Relations
rather than an int-list of relation OIDs.  And it seems to me (though
I'm getting a little fuzzy here because it's late) that you need all
of the partitions open, not just the ones that are subpartitioned,
because how else are you going to know how to remap the tuple if the
column order is different?  But I don't see this code doing that,
which makes me wonder if the partitions are being opened yet again in
some other location.
I recommend that once you fix this, you run 'make check' with #define
CLOBBER_CACHE_ALWAYS 1 and look for other hazards.  Such mistakes are
easy to make with this kind of patch.
-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company
| Attachment | Content-Type | Size | 
|---|---|---|
| partition-fixes-rmh.patch | text/x-patch | 5.8 KB | 
| From | Date | Subject | |
|---|---|---|---|
| Next Message | Robert Haas | 2016-12-07 04:43:56 | Re: Back-patch use of unnamed POSIX semaphores for Linux? | 
| Previous Message | Michael Paquier | 2016-12-07 04:26:38 | Re: Quorum commit for multiple synchronous replication. |