Quick Links

Re: performance of insert/delete/update

From:	Josh Berkus <josh(at)agliodbs(dot)com>
To:	"scott(dot)marlowe" <scott(dot)marlowe(at)ihs(dot)com>
Cc:	Wei Weng <wweng(at)kencast(dot)com>, <pgsql-performance(at)postgresql(dot)org>
Subject:	Re: performance of insert/delete/update
Date:	2002-11-21 23:34:53
Message-ID:	200211211534.53358.josh@agliodbs.com
Views:	Whole Thread \| Raw Message \| Download mbox \| Resend email
Thread:
Lists:	pgsql-hackers pgsql-performance

Scott,

> > This only works up to the limit of the memory you have available for
> > Postgres. If the updates in one transaction exceed your available
> > memory, you'll see a lot of swaps to disk log that will slow things
> > down by a factor of 10-50 times.
>
> Sorry, but that isn't true. MVCC means we don't have to hold all the data
> in memory, we can have multiple versions of the same tuples on disk, and
> use memory for what it's meant for, buffering.

Sorry, you're absolutely correct. I don't know what I was thinking of; 's the
problem with an off-the-cuff response.

Please disregard the previous quote. Instead:

Doing several large updates in a single transaction can lower performance if
the number of updates is sufficient to affect index usability and a VACUUM is
really needed between them. For example, a series of large data
transformation statements on a single table or set of related tables should
have VACCUUM statements between them, thus preventing you from putting them
in a single transaction.

Example, the series:
1. INSERT 10,000 ROWS INTO table_a;
2. UPDATE 100,000 ROWS IN table_a WHERE table_b;
3. UPDATE 100,000 ROWS IN table_c WHERE table_a;

WIll almost certainly need a VACUUM or even VACUUM FULL table_a after 2),
requiring you to split the update series into 2 transactions. Otherwise, the
"where table_a" condition in step 3) will be extremely slow.

> Also note that many folks have replaced foreign keys with triggers and
> gained in performance, as fks in pgsql still have some deadlock issues to
> be worked out.

Yeah. I think Neil Conway is overhauling FKs, which everyone considers a bit
of a hack in the current implementation, including Jan who wrote it.

> > It can be dangerous though ... in the event of a power outage, for
> > example, your database could be corrupted and difficult to recover. So
> > ... "at your own risk".
>
> No, the database will not be corrupted, at least not in my experience.
> however, you MAY lose data from transactions that you thought were
> committed. I think Tom posted something about this a few days back.

Hmmm ... have you done this? I'd like the performance gain, but I don't want
to risk my data integrity. I've seen some awful things in databases (such as
duplicate primary keys) from yanking a power cord repeatedly.

> update table set field1=field1+1
>
> are killers in an MVCC database as well.

Yeah -- don't I know it.

--
-Josh Berkus
Aglio Database Solutions
San Francisco

In response to

Re: performance of insert/delete/update at 2002-11-21 22:54:14 from scott.marlowe

Responses

Re: performance of insert/delete/update at 2002-11-22 15:56:14 from scott.marlowe

Browse pgsql-hackers by date

	From	Date	Subject
Next Message	David Wheeler	2002-11-21 23:38:43	Re: DBD::PostgreSQL compile time /run time version
Previous Message	Christopher Kings-Lynne	2002-11-21 23:02:41	Re: Optimizer & boolean syntax

Browse pgsql-performance by date

	From	Date	Subject
Next Message	Mario Weilguni	2002-11-22 07:31:11	Re: [ADMIN] H/W RAID 5 on slower disks versus no raid on
Previous Message	scott.marlowe	2002-11-21 22:54:14	Re: performance of insert/delete/update