Partitioning a table by integer value (preferably in place)

From: Pól Ua Laoínecháin <linehanp(at)tcd(dot)ie>
To: PostgreSQL General <pgsql-general(at)lists(dot)postgresql(dot)org>
Subject: Partitioning a table by integer value (preferably in place)
Date: 2021-08-09 12:16:16
Message-ID: CAF4RT5Si8N7S2MqRNYBrWKwEadnw6UjKvYK-Ze0ScB9cdKUvEw@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-general

Hi all,

Linux Fedora 34
1TB Samsung SSD
4 CPUs, 2 cores

PostgreSQL 12.7 (can upgrade if a better solution is to be found in 13
or even 14 beta2 - currently testing a proposed solution, so by the
time it's fully implemented, 14 should be on GA and hey, I might even
find a bug or two to help with the project!).

I have a 400GB joining table (one SMALLINT and the other INTEGER -
Primary Keys on other tables) with 1000 fields on one side and 10M on
the other, so 10,000M (or 10Bn) records all told.

What I would like to do is to partition by the SMALLINT (1 - 1000)
value - which would give 1,000 tables of 400MB each.

I wish to avoid having to do this manually 1,000 times - is there a
way of telling PostgreSQL to partition by value without specifying the
SMALLINT value each time?

I looked here:

https://www.enterprisedb.com/postgres-tutorials/how-use-table-partitioning-scale-postgresql

and there is LIST, RANGE and HASH partitioning.

I think a RANGE of 1 would be what I want here?

So, questions;

Is 1,000 partitions reasonable? This:

https://elephas.io/is-there-a-limit-on-number-of-partitions-handled-by-postgres/

appears to suggest that it shouldn't be a problem?

Could I go with a RANGE of, say, 10 values per partition? If I have to
explicitly code, I'd prefer this for my test - at least it would save
on the typing! :-)

This would product 100 tables of ~ 4GB each. Would I see much
performance degradation with a 4GB table on an SSD?

Finally, the icing on the cake would be if this could be done in place
- my SSD is 1TB and the output from df -h is:

test=# \! df -h
Filesystem Size Used Avail Use% Mounted on
devtmpfs 16G 0 16G 0% /dev
tmpfs 16G 212K 16G 1% /dev/shm
tmpfs 6.3G 1.8M 6.3G 1% /run
/dev/mapper/fedora_localhost--live-root 69G 11G 55G 17% /
tmpfs 16G 284K 16G 1% /tmp
/dev/sda5 976M 192M 718M 22% /boot
/dev/mapper/fedora_localhost--live-home 1.3T 898G 270G 77% /home
/dev/sda2 96M 52M 45M 54% /boot/efi
tmpfs 3.2G 96K 3.2G 1% /run/user/1000
test=#

So, I only have 270 GB left on disk - and it took > 12 Hrs to fill it
with indexes and Foreign Key constraints so I'd like to be able to do
it without having to go through that again.

This:

https://www.2ndquadrant.com/en/blog/partitioning-a-large-table-without-a-long-running-lock/

appears to suggest that it can be done online. There will be no other
activity on the table while any partitioning &c. will be ongoing.
However, the article makes no mention of space considerations.

This is my first time considering partitioning, so I'd be grateful for
any advice, pointers, references, URLs &c.... and please let me know
if I"ve left out any important information.

TIA and rgs,

Pól...

Some (relevant?) settings;

max_worker_processes = 4
max_parallel_workers_per_gather = 2
max_parallel_workers = 4
max_parallel_maintenance_workers = 2

as suggested by pgtune. Is pgtune a good bet for configuration suggestions?

Responses

Browse pgsql-general by date

  From Date Subject
Next Message Daniel Gustafsson 2021-08-09 13:44:31 Re: PostgreSQL Apt Repository instructions need updating
Previous Message Vijaykumar Jain 2021-08-09 07:23:03 Re: PostgreSQL general set of Questions.