Re: Savepoints in transactions for speed?

From: "Franklin, Dan" <dan(dot)franklin(at)pearson(dot)com>
To: Steve Atkins <steve(at)blighty(dot)com>
Cc: pgsql-performance(at)postgresql(dot)org
Subject: Re: Savepoints in transactions for speed?
Date: 2012-11-28 03:08:44
Message-ID: CAKWqM3V0EC+P5jdu3xcOuTT0mJzB8vPRfwM7akPtWTBR4dvC9A@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-performance

On Tue, Nov 27, 2012 at 6:26 PM, Steve Atkins <steve(at)blighty(dot)com> wrote:

>
> On Nov 27, 2012, at 2:04 PM, Mike Blackwell <mike(dot)blackwell(at)rrd(dot)com>
> wrote:
>
> > I need to delete about 1.5 million records from a table and reload it in
> one transaction. The usual advice when loading with inserts seems to be
> group them into transactions of around 1k records. Committing at that
> point would leave the table in an inconsistent state.
>
> I'd probably just do the whole thing in one transaction.
>
> Do you have specific reasons you want to avoid a long transaction, or just
> relying on rules of thumb? Postgresql isn't going to run out of resources
> doing a big transaction, in the way some other databases will.
>
> Long running transactions will interfere with vacuuming, but inserting a
> couple of million rows shouldn't take that long.
>
> > Would issuing a savepoint every 1k or so records negate whatever
> downside there is to keeping a transaction open for all 1.5 million
> records, or just add more overhead?
>
>
> Savepoints are going to increase overhead and have no effect on the length
> of the transaction. If you want to catch errors and not have to redo the
> entire transaction, they're great, but that's about it.
>
> > The data to reload the table is coming from a Perl DBI connection to a
> different database (not PostgreSQL) so I'm not sure the COPY alternative
> applies here.
>
> COPY works nicely from perl:
>
> $dbh->do("COPY foo FROM STDIN");
> $dbh->pg_putcopydata("foo\tbar\tbaz\n");
> $dbh->pg_putcopyend();
>
> The details are in DBD::Pg. I use this a lot for doing big-ish (tens of
> millions of rows) bulk inserts. It's not as fast as you can get, but it's
> probably as fast as you can get with perl.
>
> Cheers,
> Steve

I do this as well - insert a few million rows into a table using the
DBI::Pg copy interface. It works well.

I ended up batching the copies so that each COPY statement only does a few
hundred thousand at a time, but it's all one transaction.

The batching was necessary because of an idiosyncrasy of COPY in Pg 8.1:
each COPY statement's contents was buffered in a malloc'd space, and if
there were several million rows buffered up, the allocated virtual memory
could get quite large - as in several GB. It plus the buffer pool
sometimes exceeded the amount of RAM I had available at that time (several
years ago), with bad effects on performance.

This may have been fixed since then, or maybe RAM's gotten big enough that
it's not a problem.

Dan Franklin

In response to

Responses

Browse pgsql-performance by date

  From Date Subject
Next Message Willem Leenen 2012-11-28 06:03:20 Re: Savepoints in transactions for speed?
Previous Message Gavin Flower 2012-11-28 02:34:12 Re: Hints (was Poor performance using CTE)