Re: Bulk Inserts

From: Souvik Bhattacherjee <kivuosb(at)gmail(dot)com>
To: Adrian Klaver <adrian(dot)klaver(at)aklaver(dot)com>
Cc: "pgsql-generallists(dot)postgresql(dot)org" <pgsql-general(at)lists(dot)postgresql(dot)org>
Subject: Re: Bulk Inserts
Date: 2019-08-13 16:32:35
Message-ID: CAANrPSfx6D0qQdh6gEAQA4pcE-gk34EOS4Ac0b1ieaAKngmvZA@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-general

> All I have left is:

> select tab2 row_number () OVER (order by attr2, attr1 )AS id, attr1,
> attr2 into tab2 from tab1;

> That will not create a serial type in the id column though. You can
> attach a sequence to that column. Something like:

> 1) create sequence tab2_id start <max id + 1> owned by tab2.id;

> 2) alter table tab2 alter COLUMN id set default nextval('tab2_id');

Thanks. This is a bit indirect but works fine. Performance wise this turns
out to the best when inserting rows from one table to another (new) table
with a serial id column in the new table.

Best,
-SB

On Tue, Aug 13, 2019 at 11:08 AM Adrian Klaver <adrian(dot)klaver(at)aklaver(dot)com>
wrote:

> On 8/13/19 6:34 AM, Souvik Bhattacherjee wrote:
> > > If the selects are returning more then one row then you are already
> > > doing bulk inserts. If they are returning single rows or you want to
> > > batch them then you need some sort of code to do that. Something
> > > like(pseudo Python like code):
> >
> > > attr2_vals= [(10, 20, 30, 40), (50, 60, 70, 80)]
> >
> > > for val_batch in attr2_vals:
> > BEGIN
> > for id in val_batch:
> > insert into tab2 (attr1, attr2) (select attr1, attr2
> > from tab1 where attr2 = id)
> > COMMIT
> >
> > For *EXP 1: inserts with multiple txn:*
> > insert into tab2 (attr1, attr2) (select attr1, attr2 from tab1 where
> > attr2 = 10);
> > insert into tab2 (attr1, attr2) (select attr1, attr2 from tab1 where
> > attr2 = 20);
> >
> > tab1 has ~6M rows and there are only two values for the attribute attr2
> in
> > tab1 which are evenly distributed. So, yes, I guess I'm already doing
> > batching
> > here.
> >
> > Also, I ran the following two statements to see if their performances
> > are comparable.
> > While STMT 1 always runs faster in my machine but their performances
> > seem to differ
> > by a couple of seconds at most.
> >
> > STMT 1: select attr1, attr2 into tab2 from tab1;
> > STMT 2: insert into tab2 (select attr1, attr2 from tab1);
>
> All I have left is:
>
> select tab2 row_number () OVER (order by attr2, attr1 )AS id, attr1,
> attr2 into tab2 from tab1;
>
> That will not create a serial type in the id column though. You can
> attach a sequence to that column. Something like:
>
> 1) create sequence tab2_id start <max id + 1> owned by tab2.id;
>
> 2) alter table tab2 alter COLUMN id set default nextval('tab2_id');
>
>
>
> >
> > However adding the serial id column as an ALTER TABLE statement actually
> > takes more time
> > than inserting the tuples, so the combined total time is more than
> > double the time taken to insert
> > the tuples into tab2 without serial id column.
> >
> > Best,
> > -SB
> >
> >
> >
>
>
>
> --
> Adrian Klaver
> adrian(dot)klaver(at)aklaver(dot)com
>

In response to

Browse pgsql-general by date

  From Date Subject
Next Message rihad 2019-08-13 16:44:31 Re: Changing work_mem
Previous Message Luca Ferrari 2019-08-13 16:22:05 Re: Changing work_mem