Re: Postgres 11, partitioning with a custom hash function

From: Harry B <harrysungod(at)gmail(dot)com>
To: pgsql-general(at)lists(dot)postgresql(dot)org
Subject: Re: Postgres 11, partitioning with a custom hash function
Date: 2018-10-04 03:22:19
Message-ID: CAMG7=yVrcp6=v6HiAk_fvdTFZ9oZfMFQDuf7rQtecHveMxRabg@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-general

Hi,

I am still having trouble reconciling what happens under the HASH
partitioning!. If I have text column forming the basis of PARTITIONED BY
HASH, the HASH value used in the partitioning setup does not seem to match
to `hashtext()` of that value

CREATE TABLE loopy (k TEXT PRIMARY KEY, something_else int) PARTITION BY
HASH(k);
CREATE TABLE loopy_00 PARTITION OF loopy FOR VALUES WITH (MODULUS 32,
REMAINDER 0);
CREATE TABLE loopy_01 PARTITION OF loopy FOR VALUES WITH (MODULUS 32,
REMAINDER 1);
-- setup all tables till 31

=> explain select * from loopy where k='a';
QUERY PLAN
-------------------------------------------------------------------------------------
Append (cost=0.15..2.38 rows=1 width=36)
-> Index Scan using loopy_30_pkey on loopy_30 (cost=0.15..2.37 rows=1
width=36)
Index Cond: (k = 'a'::text)
(3 rows)

So 'a' goes to 30

=> select hashtext('a'::text);
hashtext
------------
1075015857
(1 row)

=> select 1075015857::bit(32);
bit
----------------------------------
01000000000100110111000010110001
(1 row)

=> select 30::bit(32);
bit
----------------------------------
00000000000000000000000000011110
(1 row)

I am on intel cpu, x86_64, ubuntu lts 18.4.1

On Wed, Oct 3, 2018 at 9:37 AM Harry B <harrysungod(at)gmail(dot)com> wrote:

>
> Hi,
>
> Since I didn't hear back on how to make partitioning work using a custom
> hash function, I ended up changing my app/client to use postgres' built-in
> hash method instead - I just needed them to match.
>
> https://github.com/harikb/pghash
> https://github.com/harikb/pghash/blob/master/lib/pghash/pghash.go
>
> At some point, I will need to revisit this and figure out how to have PG
> partition using a custom hash function other than the builtin, or may be pg
> will switch to xxhash or siphash.
>
> On Mon, Oct 1, 2018 at 9:41 PM Harry B <harrysungod(at)gmail(dot)com> wrote:
>
>> Hi,
>>
>> I am interested in trying the hash partitioning method now available in
>> 11 (trying the beta 4). However, I have the data already hashed at the
>> application level across multiple postgres instances. If possible, I would
>> like to keep these two hashing methods same. This would enable me to move
>> a single partition (considering we have attach/detach methods available)
>> from one instance to another and have queries work seamlessly.
>>
>> The application can control what data/query is sent to each instance -
>> the only thing making this setup impossible is (a) the built-in HASH
>> function not available/accessible to me outside of pg context, say, as a C
>> library. and (b) I don't know how to sub a known hash function (say,
>> murmur, xxhash, siphash) instead of the builtin hash function. I am not
>> particularly insisting on any particular hash function, except for it to
>> available outside of postgres (say as a C or Go library).
>>
>> Based on a discussion in the IRC channel, I was told I could try creating
>> a custom hash function (postgres extension) and use it in RANGE or LIST
>> partitioning with that expression.
>>
>> I have the following code installed as a postgres extension
>> http://dpaste.com/2594KWM, takes an implementation of xxhash.c and
>> sticks it in as a postgres extension †
>> ~/tmp/pge$ gcc -fPIC -I/usr/include/postgresql/11/server -c pge.c && gcc
>> -shared -o pge.so pge.o
>>
>> Problem is that with this setup, partitioning of the writes/inserts work
>> fine, but I don't see pg excluding the unnecessary partitions on
>> read/queries
>>
>> http://dpaste.com/1C0XY3M
>>
>> This setup based on expression also has other issues - I can't mark k as
>> a primary key or have a unique key on that column. If it is declared as a
>> hash partition, I can have those.
>>
>> This question may be related to this thread as well
>> https://www.postgresql-archive.org/Hash-Functions-td5961137.html
>>
>> † surprisingly, this function works even if I don't compile in
>> xxhash.c/o into the .so - that is yet another side mystery to figure out.
>> It is like the symbol XXH64 is already available dynamically. I did have
>> plpython installed at some point. but this is a separate issue.
>>
>> --
>> Harry
>>
>
>
> --
> Harry
>

--
Harry

In response to

Responses

Browse pgsql-general by date

  From Date Subject
Next Message David Rowley 2018-10-04 03:42:52 Re: Postgres 11, partitioning with a custom hash function
Previous Message Harry B 2018-10-03 16:37:11 Re: Postgres 11, partitioning with a custom hash function