Re: Insert into on conflict, data size upto 3 billion records

From: Karthik K <kar6308(at)gmail(dot)com>
To: Rob Sargent <robjsargent(at)gmail(dot)com>
Cc: pgsql-general(at)lists(dot)postgresql(dot)org
Subject: Re: Insert into on conflict, data size upto 3 billion records
Date: 2021-02-15 19:22:44
Message-ID: CAExg72f2xosd7iynrk3dYFjEnwDRSS2tiMt7FnaHfG2e2qrCuQ@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-general

yes, I'm using \copy to load the batch table,

with the new design that we are doing, we expect updates to be less going
forward and more inserts, one of the target columns I'm updating is
indexed, so I will drop the index and try it out, also from your suggestion
above splitting the on conflict into insert and update is performant but in
order to split the record into batches( low, high) I need to do a count of
primary key on the batch tables to first split it into batches

On Mon, Feb 15, 2021 at 11:06 AM Rob Sargent <robjsargent(at)gmail(dot)com> wrote:

>
>
> On 2/15/21 11:41 AM, Karthik K wrote:
> > exactly, for now, what I did was, as the table is already partitioned, I
> > created 50 different connections and tried updating the target table by
> > directly querying from the source partition tables. Are there any other
> > techniques that I can use to speed this up? also when we use on conflict
> > statement for both insert and update does Postgres uses batching
> > internally (committing for every 10000 records etc) or will it update
> > all records at once, in that case, does it create a version for each
> > record and do swap all at once? I'm wondering how atomicity is
> > guaranteed, also if I have to do batching other than selecting from
> > individual partitions does doing it batches of 10000 records help?
> >
> >
>
> What is your ratio of inserts versus update? Can you separate the
> inserts and updates? Is the target table indexed other than on primary
> key? If so can they be dropped?
>
> Assuming you use \copy to load the batch tables
>
> I've found this strategy to be effective:
> index batch on id
>
> --update first
> begin
> update target t set "all fields" from batch b where t.id = b.id and b.id
> between "hi" and "low"
> commit
> increment hi low, avoid overlap; repeat
>
> --insert
> begin;
> insert into target as select b.* from from batch b where not exists
> (select 1 from target v where b.id = v.id) and b.id between "hi" and "low"
> commit
> increment hi, low, avoid overlap; repeat
>
>
>
>
>
>
>
>
>

--
Regards,

Karthik K Kondamudi

In response to

Responses

Browse pgsql-general by date

  From Date Subject
Next Message Loles 2021-02-15 19:40:41 Re: Replication sequence
Previous Message Tim Cross 2021-02-15 19:12:49 Re: Insert into on conflict, data size upto 3 billion records