improvement suggestions for performance design

From: tfinneid(at)ifi(dot)uio(dot)no
To: pgsql-performance(at)postgresql(dot)org
Subject: improvement suggestions for performance design
Date: 2007-07-05 12:15:48
Message-ID: 40362.134.32.140.234.1183637748.squirrel@webmail.uio.no
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-performance

Hi

I have the following scenario for a database that I need to design, and
would like some hints on what to improve or do differently to achieve the
desired performance goal, disregarding hardware and postgres tuning.

The premise is an attribute database that stores about 100 different
attribute types as attribute values. Every X seconds, Y number of new
attribute values are stored in the database. X is constant and currently
between 6 and 20 seconds, depending on the setup. In the future X could
become as low as 3 seconds. Y can, within the next 5-10 years, become as
high as 200 000.

That means that for example, every 6 seconds 100 000 attributes needs to
be written to the database.

At the same time, somewhere between 5-20 users needs to read parts of
those newly written attributes, maybe in total 30 000 attributes.

This continues for the duration of the field operation, which could be
18hrs a day for 6 weeks. So the total db size is up towards 200 gigs.

Now here is how I suggest doing this:

1- the tables

table attribute_values:
id int
attr_type int ( references attribute_types(id) )
posX int
posY int
data_type int
value varchar(50)

table attribute_types:
id int
name varchar(200);

2- function

a function that receives an array of data and inserts each attribute.
perhaps one array per attribute data (type, posX, posY, data_type,
value) so five arrays as in parameters ot the function

3- java client

the client receives the data from a corba request, and splits it
into, say 4 equally sized blocks and executes 4 threads that insert
each block (this seems to be more efficient than just using one
thread.)

Now I am wondering if this is the most efficient way of doing it?

- I know that I could group the attributes so that each type of attribute
gets its own table with all attributes in one row. But I am not sure if
that is any more efficient than ont attribute per row since I pass
everything to the function as an array.
With the above design a change in attribute types only requires changing
the data in a table instead of having to modify the client, the function
and the tables.

- I am also wondering if writing the client and function in C would create
a more efficient solution.

any comments?

ps, I am currently running postgres 8.1, but could probably use 8.2 if it
is needed for functionality or performance reasons. It will run on a sparc
machine with solaris 10 and perhaps 4-6 processors, as many GB of RAM as
necessary and SCSI disks ( perhaps in raid 0 ).

regards

thomas

Responses

Browse pgsql-performance by date

  From Date Subject
Next Message Heikki Linnakangas 2007-07-05 12:59:30 Re: improvement suggestions for performance design
Previous Message Dimitri 2007-07-05 12:01:14 Re: Filesystem Direct I/O and WAL sync option