From: | Pól Ua Laoínecháin <linehanp(at)tcd(dot)ie> |
---|---|
To: | PostgreSQL General <pgsql-general(at)lists(dot)postgresql(dot)org> |
Subject: | Partitioning a table by integer value (preferably in place) |
Date: | 2021-08-09 12:16:16 |
Message-ID: | CAF4RT5Si8N7S2MqRNYBrWKwEadnw6UjKvYK-Ze0ScB9cdKUvEw@mail.gmail.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-general |
Hi all,
Linux Fedora 34
1TB Samsung SSD
4 CPUs, 2 cores
PostgreSQL 12.7 (can upgrade if a better solution is to be found in 13
or even 14 beta2 - currently testing a proposed solution, so by the
time it's fully implemented, 14 should be on GA and hey, I might even
find a bug or two to help with the project!).
I have a 400GB joining table (one SMALLINT and the other INTEGER -
Primary Keys on other tables) with 1000 fields on one side and 10M on
the other, so 10,000M (or 10Bn) records all told.
What I would like to do is to partition by the SMALLINT (1 - 1000)
value - which would give 1,000 tables of 400MB each.
I wish to avoid having to do this manually 1,000 times - is there a
way of telling PostgreSQL to partition by value without specifying the
SMALLINT value each time?
I looked here:
https://www.enterprisedb.com/postgres-tutorials/how-use-table-partitioning-scale-postgresql
and there is LIST, RANGE and HASH partitioning.
I think a RANGE of 1 would be what I want here?
So, questions;
Is 1,000 partitions reasonable? This:
https://elephas.io/is-there-a-limit-on-number-of-partitions-handled-by-postgres/
appears to suggest that it shouldn't be a problem?
Could I go with a RANGE of, say, 10 values per partition? If I have to
explicitly code, I'd prefer this for my test - at least it would save
on the typing! :-)
This would product 100 tables of ~ 4GB each. Would I see much
performance degradation with a 4GB table on an SSD?
Finally, the icing on the cake would be if this could be done in place
- my SSD is 1TB and the output from df -h is:
test=# \! df -h
Filesystem Size Used Avail Use% Mounted on
devtmpfs 16G 0 16G 0% /dev
tmpfs 16G 212K 16G 1% /dev/shm
tmpfs 6.3G 1.8M 6.3G 1% /run
/dev/mapper/fedora_localhost--live-root 69G 11G 55G 17% /
tmpfs 16G 284K 16G 1% /tmp
/dev/sda5 976M 192M 718M 22% /boot
/dev/mapper/fedora_localhost--live-home 1.3T 898G 270G 77% /home
/dev/sda2 96M 52M 45M 54% /boot/efi
tmpfs 3.2G 96K 3.2G 1% /run/user/1000
test=#
So, I only have 270 GB left on disk - and it took > 12 Hrs to fill it
with indexes and Foreign Key constraints so I'd like to be able to do
it without having to go through that again.
This:
https://www.2ndquadrant.com/en/blog/partitioning-a-large-table-without-a-long-running-lock/
appears to suggest that it can be done online. There will be no other
activity on the table while any partitioning &c. will be ongoing.
However, the article makes no mention of space considerations.
This is my first time considering partitioning, so I'd be grateful for
any advice, pointers, references, URLs &c.... and please let me know
if I"ve left out any important information.
TIA and rgs,
Pól...
Some (relevant?) settings;
max_worker_processes = 4
max_parallel_workers_per_gather = 2
max_parallel_workers = 4
max_parallel_maintenance_workers = 2
as suggested by pgtune. Is pgtune a good bet for configuration suggestions?
From | Date | Subject | |
---|---|---|---|
Next Message | Daniel Gustafsson | 2021-08-09 13:44:31 | Re: PostgreSQL Apt Repository instructions need updating |
Previous Message | Vijaykumar Jain | 2021-08-09 07:23:03 | Re: PostgreSQL general set of Questions. |