Re: immutable function querying table for partitioning

From: Vijaykumar Jain <vijaykumarjain(dot)github(at)gmail(dot)com>
To: "David G(dot) Johnston" <david(dot)g(dot)johnston(at)gmail(dot)com>
Cc: pgsql-general <pgsql-general(at)lists(dot)postgresql(dot)org>
Subject: Re: immutable function querying table for partitioning
Date: 2021-06-15 13:26:39
Message-ID: CAM+6J95P_tNZz_EB1zeCha-kJgKFOsQE9_=mmcrEA+iVfZ_Dhg@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-general

On Tue, 15 Jun 2021 at 18:21, David G. Johnston <david(dot)g(dot)johnston(at)gmail(dot)com>
wrote:

> You probably avoid the complications by doing the above, but the amount
of bloat you are causing seems excessive.
>
> I’d suggest an approach where you use the table data to build DDL in a
form that does adhere to the limitations described in the documentation
instead of building you system upon a lie that your immutable function is
actually a stable one.
>
> David J.

Yes I do understand that. That was just a simple example.
I am just playing with approaches, this is not a work requirement.
This would be a simulation for sharded setup across servers.
In reality I would be moving only the relevant data and run a truncate to
get rid of bloat once data is moved. but that was not the point.

I wanted sharding to be an iterative setup. as I will make bad decisions.
I want to be able to avoid hotspots as well as, too much scatter gather.
I tried the fdw approach, the problem with it is , it opens a cursor, then
runs through all the shards sequentially. when we have too many shards, it
suffers linear degradation.

I also tried dblink to run an async scatter gather where aggregation/ limit
could not be pushed down.

and it has a lot counting on fetch_size across shards. I mean if it too
low, it takes forever to get data that does not get reduced at remote
server.
it is too high the transfer is fast, but random query result transferring
a lot of data results in oom. I know i have to tune these things, but i
cannot anticipate data growth.

I was thinking of a way where I could have all related data in one
partition.
i mean if i am having a setup with highly normalized tables, i could use a
custom function to ensure all related data remains in one shard.

while studying approaches to how others shard, i saw this in oracle. when
all related data is in one shard which uses partition by reference,
since we do not have that in postgres, i thought i could use a custom
function.
sharding-schema-design.html
<https://docs.oracle.com/en/database/oracle/oracle-database/18/shard/sharding-schema-design.html#GUID-4E2E01CA-5103-4F4D-AB2A-79527A62B2E7>
but these are experiments. I am not losing anything if I fail :)
I want to figure out ways to rebalance without downtime.

--
Thanks,
Vijay
Mumbai, India

In response to

Browse pgsql-general by date

  From Date Subject
Next Message Tom Lane 2021-06-15 13:39:31 Re: Losing data because of problematic configuration?
Previous Message David G. Johnston 2021-06-15 12:51:27 Re: immutable function querying table for partitioning