Re: improvement suggestions for performance design

From: Heikki Linnakangas <heikki(at)enterprisedb(dot)com>
To: tfinneid(at)ifi(dot)uio(dot)no
Cc: pgsql-performance(at)postgresql(dot)org
Subject: Re: improvement suggestions for performance design
Date: 2007-07-05 12:59:30
Message-ID: 468CEB32.30109@enterprisedb.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-performance

I would strongly suggest that you use a proper relational schema,
instead of storing everything in two tables. I don't know your
application, but a schema like that is called an Entity-Attribute-Value
(though your entity seems to be just posx and posy) and it should raise
a big red flag in the mind of any database designer. In particular,
constructing queries against an EAV schema is a major pain in the ass.
This has been discussed before on postgresql lists as well, you might
want to search and read the previous discussions.

Ignoring the EAV issue for a moment, it's hard to give advice without
knowing what kind of queries are going to executed. Are the lookups
always going to be by id? By posx/posy perhaps? By attribute?

tfinneid(at)ifi(dot)uio(dot)no wrote:
> Hi
>
> I have the following scenario for a database that I need to design, and
> would like some hints on what to improve or do differently to achieve the
> desired performance goal, disregarding hardware and postgres tuning.
>
> The premise is an attribute database that stores about 100 different
> attribute types as attribute values. Every X seconds, Y number of new
> attribute values are stored in the database. X is constant and currently
> between 6 and 20 seconds, depending on the setup. In the future X could
> become as low as 3 seconds. Y can, within the next 5-10 years, become as
> high as 200 000.
>
> That means that for example, every 6 seconds 100 000 attributes needs to
> be written to the database.
>
> At the same time, somewhere between 5-20 users needs to read parts of
> those newly written attributes, maybe in total 30 000 attributes.
>
> This continues for the duration of the field operation, which could be
> 18hrs a day for 6 weeks. So the total db size is up towards 200 gigs.
>
> Now here is how I suggest doing this:
>
> 1- the tables
>
> table attribute_values:
> id int
> attr_type int ( references attribute_types(id) )
> posX int
> posY int
> data_type int
> value varchar(50)
>
> table attribute_types:
> id int
> name varchar(200);
>
>
>
> 2- function
>
> a function that receives an array of data and inserts each attribute.
> perhaps one array per attribute data (type, posX, posY, data_type,
> value) so five arrays as in parameters ot the function
>
> 3- java client
>
> the client receives the data from a corba request, and splits it
> into, say 4 equally sized blocks and executes 4 threads that insert
> each block (this seems to be more efficient than just using one
> thread.)
>
> Now I am wondering if this is the most efficient way of doing it?
>
> - I know that I could group the attributes so that each type of attribute
> gets its own table with all attributes in one row. But I am not sure if
> that is any more efficient than ont attribute per row since I pass
> everything to the function as an array.
> With the above design a change in attribute types only requires changing
> the data in a table instead of having to modify the client, the function
> and the tables.
>
> - I am also wondering if writing the client and function in C would create
> a more efficient solution.
>
> any comments?
>
> ps, I am currently running postgres 8.1, but could probably use 8.2 if it
> is needed for functionality or performance reasons. It will run on a sparc
> machine with solaris 10 and perhaps 4-6 processors, as many GB of RAM as
> necessary and SCSI disks ( perhaps in raid 0 ).

--
Heikki Linnakangas
EnterpriseDB http://www.enterprisedb.com

In response to

Responses

Browse pgsql-performance by date

  From Date Subject
Next Message tfinneid 2007-07-05 13:35:57 Re: improvement suggestions for performance design
Previous Message tfinneid 2007-07-05 12:15:48 improvement suggestions for performance design