From: | David Rowley <dgrowleyml(at)gmail(dot)com> |
---|---|
To: | "Wong, Kam Fook (TR Technology)" <kamfook(dot)wong(at)thomsonreuters(dot)com> |
Cc: | pgsql-general <pgsql-general(at)postgresql(dot)org> |
Subject: | Re: How to Copy/Load 1 billions rows into a Partition Tables Fast |
Date: | 2024-10-15 04:57:46 |
Message-ID: | CAApHDvre3nuCLA4DVfYSW_EAiuV9FKaZV6k7_dPMSa9d6_Mg2Q@mail.gmail.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-general |
On Tue, 15 Oct 2024 at 06:59, Wong, Kam Fook (TR Technology)
<kamfook(dot)wong(at)thomsonreuters(dot)com> wrote:
> I am trying to copy a table (Postgres) that is close to 1 billion rows into a Partition table (Postgres) within the same DB. What is the fastest way to copy the data? This table has 37 columns where some of which are text data types.
Is the purpose of this question because you're partitioning an
existing table? If so, you might want to consider if speed is the
biggest consideration to the requirements. It is possible, for
example, to partition a table "online" by using table inheritance as
an intermediate way to partition the table and migrate the rows in
smaller batches into inheritance child tables with CHECK constraints
backing up the partition constraint. You can use a CTE with a DELETE
.. WHERE <clause to some small batch of rows> RETURNING with an INSERT
INTO new_table SELECT * FROM cte;. Once the inheritance parent table
is empty, you can then consider rearranging the inheritance hierarchy
into a partitioned table and its partitions. The CHECK constraint will
allow the tables to be ATTACHed as partitions to a new partitioned
table without having to scan each partition to ensure no rows violate
the partition constraint. If done correctly, the only blocking
operation done is some DDL which includes renaming a table and
attaching all the partitions. All of that should be metadata-only
operations. You'll want to rehearse the migration a few times away
from production to help ensure it'll run smoothly on the day.
I'm not familiar with pg_bulkload so can't comment on the other
suggestions, however, I'd be surprised if exporting the data out of
and back into PostgreSQL would be faster than having it remain inside
PostgreSQL. Not exporting/importing means you don't need to call
output and input functions for every row and column. If you didn't
want to go down the inheritance table as an intermediate step, then
you might find it's quite fast to start up a series of parallel jobs
to INSERT INTO partition_name SELECT * FROM original_table WHERE <rows
for this partition>;
David
From | Date | Subject | |
---|---|---|---|
Next Message | Durgamahesh Manne | 2024-10-15 05:09:13 | Re: Inefficient use of index scan on 2nd column of composite index during concurrent activity |
Previous Message | Juan Rodrigo Alejandro Burgos Mella | 2024-10-15 04:27:57 | Re: How to Copy/Load 1 billions rows into a Partition Tables Fast |