Re: Bulk Inserts

From: Souvik Bhattacherjee <kivuosb(at)gmail(dot)com>
To: Adrian Klaver <adrian(dot)klaver(at)aklaver(dot)com>
Cc: "pgsql-generallists(dot)postgresql(dot)org" <pgsql-general(at)lists(dot)postgresql(dot)org>
Subject: Re: Bulk Inserts
Date: 2019-08-13 13:34:35
Message-ID: CAANrPSejKbNVwMBEioHipAM3B6TA8=M_dxiaXHg82BDiO4MOdQ@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-general

> If the selects are returning more then one row then you are already
> doing bulk inserts. If they are returning single rows or you want to
> batch them then you need some sort of code to do that. Something
> like(pseudo Python like code):

> attr2_vals= [(10, 20, 30, 40), (50, 60, 70, 80)]

> for val_batch in attr2_vals:
BEGIN
for id in val_batch:
insert into tab2 (attr1, attr2) (select attr1, attr2
from tab1 where attr2 = id)
COMMIT

For *EXP 1: inserts with multiple txn:*
insert into tab2 (attr1, attr2) (select attr1, attr2 from tab1 where attr2
= 10);
insert into tab2 (attr1, attr2) (select attr1, attr2 from tab1 where attr2
= 20);

tab1 has ~6M rows and there are only two values for the attribute attr2 in
tab1 which are evenly distributed. So, yes, I guess I'm already doing
batching
here.

Also, I ran the following two statements to see if their performances are
comparable.
While STMT 1 always runs faster in my machine but their performances seem
to differ
by a couple of seconds at most.

STMT 1: select attr1, attr2 into tab2 from tab1;
STMT 2: insert into tab2 (select attr1, attr2 from tab1);

However adding the serial id column as an ALTER TABLE statement actually
takes more time
than inserting the tuples, so the combined total time is more than double
the time taken to insert
the tuples into tab2 without serial id column.

Best,
-SB

On Sun, Aug 11, 2019 at 11:11 AM Adrian Klaver <adrian(dot)klaver(at)aklaver(dot)com>
wrote:

> On 8/10/19 7:47 PM, Souvik Bhattacherjee wrote:
> > Hi Adrian,
> >
> > Thanks for the response.
> >
> > > Yes, but you will some code via client or function that batches the
> > > inserts for you.
> >
> > Could you please elaborate a bit on how EXP 1 could be performed such
> > that it uses bulk inserts?
>
> I guess it comes down to what you define as bulk inserts. From your OP:
>
> EXP 1: inserts with multiple txn:
> insert into tab2 (attr1, attr2) (select attr1, attr2 from tab1 where
> attr2 = 10);
> insert into tab2 (attr1, attr2) (select attr1, attr2 from tab1 where
> attr2 = 20);
>
> If the selects are returning more then one row then you are already
> doing bulk inserts. If they are returning single rows or you want to
> batch them then you need some sort of code to do that. Something
> like(pseudo Python like code):
>
> attr2_vals= [(10, 20, 30, 40), (50, 60, 70, 80)]
>
> for val_batch in attr2_vals:
> BEGIN
> for id in val_batch:
> insert into tab2 (attr1, attr2) (select attr1, attr2
> from tab1 where attr2 = id)
> COMMIT
>
> >
> > Best,
> > -SB
>
>
>
> --
> Adrian Klaver
> adrian(dot)klaver(at)aklaver(dot)com
>

In response to

Responses

Browse pgsql-general by date

  From Date Subject
Next Message Alban Hertroys 2019-08-13 14:35:05 Re: A GROUP BY question
Previous Message David G. Johnston 2019-08-13 13:19:51 Re: A GROUP BY question