From: | Francisco Olarte <folarte(at)peoplecall(dot)com> |
---|---|
To: | Greg Sabino Mullane <htamfids(at)gmail(dot)com> |
Cc: | Lok P <loknath(dot)73(at)gmail(dot)com>, pgsql-general <pgsql-general(at)lists(dot)postgresql(dot)org> |
Subject: | Re: How to do faster DML |
Date: | 2024-02-03 19:20:01 |
Message-ID: | CA+bJJbx0ObdkWJ-Q5qvsM+v1jqLA2bQi+Atqc675Ds5z5AjyEg@mail.gmail.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-general |
On Sat, 3 Feb 2024 at 19:29, Greg Sabino Mullane <htamfids(at)gmail(dot)com> wrote:
...
> Given the size of your table, you probably want to divide that up.
> As long as nothing is changing the original table, you could do:
>
> insert into mytable2 select * from mytable1 order by ctid limit 10_000_000 offset 0;
> insert into mytable2 select * from mytable1 order by ctid limit 10_000_000 offset 10_000_000;
> insert into mytable2 select * from mytable1 order by ctid limit 10_000_000 offset 20_000_000;
Is it me or does this have the faint smell of quadratic behaviour? I
mean, you need to read and discard the first 10M to do offset 10M (
also I believe ctid order is implied in sequential scan, but no so
sure, if not it would need a full table sort on each pass ).
When doing things like this, I normally have some kind of unique index
and do it by selecting with limit above the last read value( stored
when doing it via code, as I normally load first, index later so I
cannot select max() fast on the target ). Or, with some kind of
"sparse" index (meaning, groups much less than the batch size ) and a
little code you can select where index_col > last order by index_col
limit 10M, remember last received index_col and reselect discarding
missing ( or just reselect it when doing your on conflict do nothing,
which also has a fast select max(id) due to the PK, it will work if it
has an index on id column on the original even if not unique ) to
avoid that.
Also, I'm not sure if ctid is ordered and you can select where
ctid>last ordered, if that works it probably is faster for immutable
origins.
Francisco Olarte.
From | Date | Subject | |
---|---|---|---|
Next Message | Peter J. Holzer | 2024-02-03 20:29:47 | Re: How to do faster DML |
Previous Message | Greg Sabino Mullane | 2024-02-03 18:28:47 | Re: How to do faster DML |