Re: High update activity, PostgreSQL vs BigDBMS

From: Ron <rjpeace(at)earthlink(dot)net>
To: Guy Rouillier <guyr-ml1(at)burntmail(dot)com>
Cc: PostgreSQL Performance <pgsql-performance(at)postgresql(dot)org>
Subject: Re: High update activity, PostgreSQL vs BigDBMS
Date: 2006-12-29 12:52:59
Message-ID: E1H0HFH-0004cw-6E@elasmtp-dupuy.atl.sa.earthlink.net
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-performance

At 12:46 AM 12/28/2006, Guy Rouillier wrote:
>I don't want to violate any license agreement by discussing
>performance, so I'll refer to a large, commercial
>PostgreSQL-compatible DBMS only as BigDBMS here.
>
>I'm trying to convince my employer to replace BigDBMS with
>PostgreSQL for at least some of our Java applications. As a proof
>of concept, I started with a high-volume (but conceptually simple)
>network data collection application. This application collects
>files of 5-minute usage statistics from our network devices, and
>stores a raw form of these stats into one table and a normalized
>form into a second table. We are currently storing about 12 million
>rows a day in the normalized table, and each month we start new
>tables. For the normalized data, the app inserts rows initialized
>to zero for the entire current day first thing in the morning, then
>throughout the day as stats are received, executes updates against
>existing rows. So the app has very high update activity.
>
>In my test environment, I have a dual-x86 Linux platform running the
>application, and an old 4-CPU Sun Enterprise 4500 running BigDBMS
>and PostgreSQL 8.2.0 (only one at a time.) The Sun box has 4 disk
>arrays attached, each with 12 SCSI hard disks (a D1000 and 3 A1000,
>for those familiar with these devices.) The arrays are set up with
>RAID5. So I'm working with a consistent hardware platform for this
>comparison. I'm only processing a small subset of files (144.)
>
>BigDBMS processed this set of data in 20000 seconds, with all
>foreign keys in place. With all foreign keys in place, PG took
>54000 seconds to complete the same job. I've tried various
>approaches to autovacuum (none, 30-seconds) and it doesn't seem to
>make much difference. What does seem to make a difference is
>eliminating all the foreign keys; in that configuration, PG takes
>about 30000 seconds. Better, but BigDBMS still has it beat significantly.

If you are using pg configured as default installed, you are not
getting pg's best performance. Ditto using data structures optimized
for BigDBMS.

A= go through each query and see what work_mem needs to be for that
query to be as RAM resident as possible. If you have enough RAM, set
work_mem for that query that large. Remember that work_mem is =per
query=, so queries running in parallel eat the sum of each of their work_mem's.

B= Make sure shared buffers is set reasonably. A good rule of thumb
for 8.x is that shared buffers should be at least ~1/4 your RAM. If
your E4500 is maxed with RAM, there's a good chance shared buffers
should be considerably more than 1/4 of RAM.

C= What file system are you using? Unlike BigDBMS, pg does not have
its own native one, so you have to choose the one that best suits
your needs. For update heavy applications involving lots of small
updates jfs and XFS should both be seriously considered.

D= Your table schema and physical table layout probably needs to
change. What BigDBMS likes here is most likely different from what pg likes.

E= pg does not actually update records in place. It appends new
records to the table and marks the old version invalid. This means
that things like pages size, RAID stripe size, etc etc may need to
have different values than they do for BigDBMS. Another consequence
is that pg likes RAID 10 even more than most of its competitors.

F= This may seem obvious, but how many of the foreign keys and other
overhead do you actually need? Get rid of the unnecessary.

G= Bother the folks at Sun, like Josh Berkus, who know pq inside and
out +and+ know your HW (or have access to those that do ;-) )inside
and out. I'll bet they'll have ideas I'm not thinking of.

H= Explain Analyze is your friend. Slow queries may need better
table statistics, or better SQL, or may be symptoms of issues "C" or
"D" above or ...

>I've got PG configured so that that the system database is on disk
>array 2, as are the transaction log files. The default table space
>for the test database is disk array 3. I've got all the reference
>tables (the tables to which the foreign keys in the stats tables
>refer) on this array. I also store the stats tables on this
>array. Finally, I put the indexes for the stats tables on disk
>array 4. I don't use disk array 1 because I believe it is a software array.
I= With 4 arrays of 12 HDs each, you definitely have enough spindles
to place pg_xlog somewhere separate from all the other pg tables. In
addition, you should analyze you table access patterns and then
scatter them across your 4 arrays in such as way as to minimize head
contention.

>I'm out of ideas how to improve this picture any further. I'd
>appreciate some suggestions. Thanks.
Hope this helps,

Ron Peacetree

In response to

Responses

Browse pgsql-performance by date

  From Date Subject
Next Message Arnau 2006-12-29 14:26:15 Re: High update activity, PostgreSQL vs BigDBMS
Previous Message Dave Cramer 2006-12-29 03:35:29 Re: Postgresql Configutation and overflow