From: | "btober(at)computer(dot)org" <btober(at)broadstripe(dot)net> |
---|---|
To: | Jonathan Vanasco <postgres(at)2xlp(dot)com> |
Cc: | pgsql-general general <pgsql-general(at)postgresql(dot)org> |
Subject: | Re: efficiently migrating 'old' data from one table to another |
Date: | 2017-01-12 20:19:19 |
Message-ID: | 1464496899.139156708.1484252359001.JavaMail.zimbra@broadstripe.net |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-general |
----- Original Message -----
> From: "Jonathan Vanasco" <postgres(at)2xlp(dot)com>
> To: "pgsql-general general" <pgsql-general(at)postgresql(dot)org>
> Sent: Thursday, January 12, 2017 3:06:14 PM
> Subject: [GENERAL] efficiently migrating 'old' data from one table to another
>
> I'm just wondering if there's a more efficient way of handling a certain
> periodic data migration.
>
> We have a pair of tables with this structure:
>
> table_a__live
> column_1 INT
> column_2 INT
> record_timestamp TIMESTAMP
>
> table_a__archive
> column_1 INT
> column_2 INT
> record_timestamp TIMESTAMP
>
> periodically, we must migrate items that are 'stale' from `table_a__live ` to
> `table_a__archive`. The entries are copied over to the archive, then
> deleted.
>
> The staleness is calculated based on age-- so we need to use INTERVAL. the
> "live" table can have anywhere from 100k to 20MM records.
>
> the primary key on `table_a__live` is a composite of column_1 & column_2,
>
> In order to minimize scanning the table, we opted to hint migrations with a
> dedicated column:
>
> ALTER TABLE table_a__live ADD is_migrate BOOLEAN DEFAULT NULL;
> CREATE INDEX idx_table_a__live_migrate ON table_a__live(is_migrate) WHERE
> is_migrate IS NOT NULL;
>
> so our migration is then based on that `is_migrate` column:
>
> BEGIN;
> UPDATE table_a__live SET is_migrate = TRUE WHERE record_timestamp <
> transaction_timestamp() AT TIME ZONE 'UTC' - INTERVAL '1 month';
> INSERT INTO table_a__archive (column_1, column_2, record_timestamp) SELECT
> column_1, column_2, record_timestamp FROM table_a__live WHERE is_migrate IS
> TRUE;
> DELETE FROM table_a__live WHERE is_migrate IS TRUE;
> COMMIT;
>
> The inserts & deletes are blazing fast, but the UPDATE is a bit slow from
> postgres re-writing all the rows.
>
> can anyone suggest a better approach?
>
> I considered copying everything to a tmp table then inserting/deleting based
> on that table -- but there's a lot of disk-io on that approach too.
Review manual section 7.8.2. Data-Modifying Statements in WITH
https://www.postgresql.org/docs/9.6/static/queries-with.html
-- B
From | Date | Subject | |
---|---|---|---|
Next Message | John R Pierce | 2017-01-12 20:40:53 | Re: efficiently migrating 'old' data from one table to another |
Previous Message | Jonathan Vanasco | 2017-01-12 20:06:14 | efficiently migrating 'old' data from one table to another |