Quick Links

Re: [POC] hash partitioning

From:	amul sul <sulamul(at)gmail(dot)com>
To:	Robert Haas <robertmhaas(at)gmail(dot)com>
Cc:	Ashutosh Bapat <ashutosh(dot)bapat(at)enterprisedb(dot)com>, Jesper Pedersen <jesper(dot)pedersen(at)redhat(dot)com>, Amit Langote <Langote_Amit_f8(at)lab(dot)ntt(dot)co(dot)jp>, Thom Brown <thom(at)linux(dot)com>, Dilip Kumar <dilipbalaut(at)gmail(dot)com>, Yugo Nagata <nagata(at)sraoss(dot)co(dot)jp>, David Steele <david(at)pgmasters(dot)net>, Greg Stark <stark(at)mit(dot)edu>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: [POC] hash partitioning
Date:	2017-10-24 11:21:27
Message-ID:	CAAJ_b97R2rJinGPAVmZZzpNV=-5BgYFxDfY9HYdM1bCYJFGmQw@mail.gmail.com
Views:	Whole Thread \| Raw Message \| Download mbox \| Resend email
Thread:
Lists:	pgsql-hackers

On Thu, Oct 12, 2017 at 6:38 PM, amul sul <sulamul(at)gmail(dot)com> wrote:
> On Thu, Oct 12, 2017 at 6:31 AM, Robert Haas <robertmhaas(at)gmail(dot)com> wrote:
>> On Tue, Oct 10, 2017 at 7:07 AM, amul sul <sulamul(at)gmail(dot)com> wrote:
>>> How about the attached patch(0003)?
>>> Also, the dim variable is renamed to natts.
>>
>> I'm not sure I believe this comment:
>>
>> + /*
>> + * We arrange the partitions in the ascending order of their modulus
>> + * and remainders. Also every modulus is factor of next larger
>> + * modulus. This means that the index of a given partition is same as
>> + * the remainder of that partition. Also entries at (remainder + N *
>> + * modulus) positions in indexes array are all same for (modulus,
>> + * remainder) specification for any partition. Thus datums array from
>> + * both the given bounds are same, if and only if their indexes array
>> + * will be same. So, it suffices to compare indexes array.
>> + */
>>
>> I am particularly not sure that I believe that the index of a
>> partition must be the same as the remainder. It doesn't seem like
>> that would be true when there is more than one modulus or when some
>> partitions are missing.
>>
>
> Looks like an explanation by the comment is not good enough, will think on this.
>
> Here are the links for the previous discussion:
> 1] https://postgr.es/m/CAFjFpRfHqSGBjNgJV2p%2BC4Yr5Qxvwygdsg4G_VQ6q9NTB-i3MA%40mail.gmail.com
> 2] https://postgr.es/m/CAFjFpRdeESKFkVGgmOdYvmD3d56-58c5VCBK0zDRjHrkq_VcNg%40mail.gmail.com
>
I have modified the comment little bit, now let me explain the theory behind it.

rd_partdesc->boundinfo->indexes array stores an index in rd_partdesc->oids
array corresponding to a given partition falls at the positions. And position in
indexes array is decided using remainder + N * modulus_of_that_partition
(where N = 0,1,2,..,).

For the case where the same modulus, the remainder will be 0,1,2,..,
and the index of that partition will be at 0,1,2,..,. (N=0).

For the case where more than one modulus then an index of a partition oid in the
oids array could be stored at the multiple places in indexes array if
its modulus is < greatest_modulus amongst bound (where N = 0,1,2,..,).

For example, partition bound (Modulus, remainder) = p1(2,0), p2(4,1),
p3(8,3), p4(8,7) Oids array [p1,p2,p3,p4] sorted by Modulus and then
by remainder and indexes array [0, 1, 0, 3, 0, 1, 0, 4] size of indexes
array is greatest_modulus.

In other word, if a partition index in oids array in the indexes array is
stored multiple times, then the lowest of the differences between them
is the modulus of that partition. In above case for the partition p1, index
in oids array stored at 0,2,4,6. You can see lowest is the remainder and
minimum difference is the modulus of p1.

Since indexes arrays in both the bounds are same, for a given index in oids
array, the positions where it falls is same for both bounds. One can argue that
two different moduli could have the same remainder position, which is
not allowed
because that will cause partition overlap error at creation and also we have a
restriction on modulus that each modulus in the hash partition bound should be
the factor of next modulus.

> [....]
>
>> +static uint64
>> +mix_hash_value(int nkeys, Datum *hash_array, bool *isnull)
>>
> How about combining high 32 bits and the low 32 bits separately as shown below?
>
> static inline uint64
> hash_combine64(uint64 a, uint64 b)
> {
> return (((uint64) hash_combine((uint32) a >> 32, (uint32) b >> 32) << 32)
> | hash_combine((unit32) a, (unit32) b));
> }
>
I have used hash_combine64 function suggested by Andres [1].

>[....]
>> Have you checked how well the tests you've added cover the code you've
>> added? What code is not covered by the tests, and is there any way to
>> cover it?
>>
> Will try to get gcov report for this patch.
>
Tests in the attached patch covers almost all the code expect few[2].

Updated patch attached.

1] https://postgr.es/m/20171012194353.3nealiykmjura4bi%40alap3.anarazel.de
2] Refer gcov_output.txt attachment.

Regards,
Amul Sul

Attachment	Content-Type	Size
gcov_output.txt	text/plain	1.1 KB
0001-partition_bounds_copy-code-refactoring-v1.patch	application/octet-stream	2.1 KB
0002-hash-partitioning_another_design-v25.patch	application/octet-stream	88.9 KB
0003-Enable-partition-wise-join-support-v3.patch	application/octet-stream	10.7 KB

In response to

Re: [POC] hash partitioning at 2017-10-12 13:08:37 from amul sul

Responses

Re: [POC] hash partitioning at 2017-10-29 07:08:28 from Robert Haas

Browse pgsql-hackers by date

	From	Date	Subject
Next Message	Andres Freund	2017-10-24 11:30:04	Re: [POC] hash partitioning
Previous Message	anthony	2017-10-24 11:01:29	Transform for pl/perl