Grave performance issues...

From: "Ztream" <ztream(at)highrad(dot)org>
To: pgsql-general(at)postgresql(dot)org
Subject: Grave performance issues...
Date: 2001-12-27 19:03:21
Message-ID: a0fr1a$1eil$1@news.tht.net
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-general

I am porting a system that I have been developing on an MS platform
(including SQL Server) to a linux system with postgresql. Apart from the
usual difficulties (and not being very knowledgeable in the unix world), I
have now gotten most things to work. However, I am experiencing some pretty
serious performance issues.
The system collects data from users thru a web interface, and also has a
background process that does rather heavy work. I will here refer to but one
of the SQL statements, to keep things focused. The idea here is that the
background process is, among other things, constructing/updating a table
called Distribution from a table called Data, which then represents the
statistic distribution of the information found in Data (using user groups,
too).
The update-statement actually performing this looks like this:

UPDATE Distribution
SET Value =
(
SELECT COUNT(*) FROM Data INNER JOIN GroupMember ON Data.UserID =
GroupMember.UserID
WHERE ItemID = Distribution.ItemID AND Grade = Distribution.Grade AND
WeightGroupID = Distribution.WeightGroupID
)

Distribution takes the form:
CREATE TABLE Distribution (
Grade int NOT NULL,
ItemID numeric(18, 0) NOT NULL,
WeightGroupID numeric(18, 0) NOT NULL,
Value float NOT NULL,
PRIMARY KEY (Grade, ItemID, WeightGroupID)
);

I am unsure if I need to post the DDL descriptions of the other tables; feel
free to ask.

Anyway, there are two seemingly distinct problems with this:

*1
As you can see, this UPDATE touches the entire table upon each execution,
which in postgresql seems to mean that performance quickly deteriorates
unless you are constantly vacuuming. To answer an obvious question, I *do*
feel that this is a necessary thing to do - if I were to update only those
rows that would acquire a new value, it would most probably turn out to be
the majority of them as the system is expected to receive a steady flow of
input data when launched.
This operation also needs to be performed very often - ideally about every
10th second. I tried leaving the background process running for a few days
without vacuuming during the holiday, and while testing today I found that
the above query takes aproximately 2 minutes to complete - for a
Distribution size of 600 rows! Also, neither of the other two tables
referenced contained more than 30 rows.

*2
Even when the involved tables are freshly vacuumed (and vacuum analyzed),
the above query using the data amount from (*1) takes about 4 seconds to
complete. That should be compared to the about 50ms it takes to complete for
the same amount of data on Windows 2000 against SQL Server on a comparable
machine.

Needless to say, this is rather catastrophical, and I'm pretty much out of
ideas. The performance values given above are those I got while executing
the query directly in psql, with no other connections to the database, so
this does not seem to me to be a problem with the system in itself. A
configuration problem, perhaps? A bug? SQL ignorance on my part? I could
surely use some help.
As previously noted, I am not very comfortable with unix systems, and not
exactly an SQL guru either.
The linux distribution is some version of redhat, and the postgresql version
is 7.1.3.

Any help or attempt to help would be greatly appreciated.

/ Ztream

Responses

Browse pgsql-general by date

  From Date Subject
Next Message jlr 2001-12-27 19:57:08 Re: Shipping database help desperately needed
Previous Message Jeff Eckermann 2001-12-27 17:16:17 Re: Migrate Store Procedure Sybase to PostgreSQL