Re: How many insert + update should one transaction

From: "Jim C(dot) Nasby" <jnasby(at)pervasive(dot)com>
To: Scott Marlowe <smarlowe(at)g2switchworks(dot)com>
Cc: Yonatan Ben-Nes <da(at)canaan(dot)co(dot)il>, pgsql-general(at)postgresql(dot)org
Subject: Re: How many insert + update should one transaction
Date: 2005-09-24 21:22:22
Message-ID: 20050924212222.GN7630@pervasive.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-general

Just remember the first rule of performance tuning: don't.

Unless you *know* having the dead rows will be an issue, you will almost
certainly be best off going the simple, straightforward route.

On Fri, Sep 23, 2005 at 10:49:00AM -0500, Scott Marlowe wrote:
> On Fri, 2005-09-23 at 05:51, Yonatan Ben-Nes wrote:
> > Hi all,
> >
> > Every few days I need to DELETE all of the content of few tables and
> > INSERT new data in them.
> > The amount of new data is about 5 million rows and each row get about 3
> > queries (INSERT + UPDATE).
> > Now because I need the old data to be displayed till all of the new data
> > will be available I'm doing all of the process of deleting the old
> > content and inserting the new one in one transaction.
> > Should I divide the insertion so ill insert the new data into a
> > temporary table and the transaction should be commited every
> > 100,1000,10000 whatever queries? or maybe it doesnt matter to the server
> > whats the size of the transaction and its ok to handle such a process in
> > one transaction?
>
> The only possible issue would be one of capacity, and possibly having a
> lot of dead tuples laying about.
>
> If you have 5 million rows, and you update every one, then you now have
> 5 million live and 5 million dead tuples in your database. A Vacuum
> full will take quite a while.
>
> If you're fsm is set large enough, then as long as you vacuum (regular,
> non full vacuum) between these transactions, then the 5 million dead
> tuples should get reused. however, the performance of your database
> will for selects and such will be like it was a 10 million row database.
>
> Given that you NEED to have all 10 million tuples in the database at the
> same time, the use of a temp / holding table would allow you to truncate
> the main table, move everything into the main table, and then drop /
> truncate the temp / holding table.
>
> If you truncate the main table, then initiate another transaction to
> move the data into it, it shouldn't be so bloated, but the down side is
> you'll have a period of time when it appears empty to users.
>
> So, the real question is whether or not you can afford to have an empty
> table at some point in the process.
>
> If you can't, then either method (running the whole transaction against
> the one table or using the temp / holding table) are equivalent. If you
> can, there should be a noticeable gain from the method of truncating the
> main table outside the update transaction.
>
> If you need that table to always have the old or new tuples (i.e. never
> be empty) and you can afford the very lengthy vacuum full on the 5
> million dead rows, then that method will give you the best select
> performance the rest of the day.
>
> ---------------------------(end of broadcast)---------------------------
> TIP 2: Don't 'kill -9' the postmaster
>

--
Jim C. Nasby, Sr. Engineering Consultant jnasby(at)pervasive(dot)com
Pervasive Software http://pervasive.com work: 512-231-6117
vcard: http://jim.nasby.net/pervasive.vcf cell: 512-569-9461

In response to

Browse pgsql-general by date

  From Date Subject
Next Message Jim C. Nasby 2005-09-24 22:44:47 What is an 'unused item pointer'
Previous Message Ron Mayer 2005-09-24 21:02:00 Re: Backend crash with user defined aggregate