Re: performace review

From: "Brandon Aiken" <BAiken(at)winemantech(dot)com>
To: "Alexander Staubo" <alex(at)purefiction(dot)net>, <pgsql-general(at)postgresql(dot)org>
Subject: Re: performace review
Date: 2006-10-07 22:38:15
Message-ID: F8E84F0F56445B4CB39E019EF67DACBA2EF536@exchsrvr.winemantech.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-general

Denormalization should reduce the number of joins and reduce the overall number of tables, yes? And the idea is to fully normalize and then back off because of physical limitations in the database you're using *with full knowledge and understanding that you're sacrificing the relational model to do it*. They obviously did not do that. They just built a denormalized database.

Look at one of the views:

CREATE OR REPLACE VIEW kernel_view_027 AS
(( SELECT 0 AS object_rid, act.object_oid, act.object_idx, act0."p$$assigned_to__rid" AS "p$$object_parent__rid", act0."p$$assigned_to__oid" AS "p$$object_parent__oid", act.created_at, act.modified_at, act.created_by, act.modified_by, act.object__class, act.access_level_browse, act.access_level_update, act.access_level_delete, act."owner", act.name, act.description, act.percent_complete, act.priority, act.due_by, act.scheduled_start, act.scheduled_end, act.actual_start, act.actual_end, act.misc1, act.misc2, act.misc3, act.activity_state, act.activity_number, act.severity, act.assigned_to, act."p$$assigned_to__rid", act."p$$assigned_to__oid", ( SELECT acc.full_name
FROM kernel_account acc
WHERE acc.object_rid::text = act."p$$assigned_to__rid"::text AND acc.object_oid::text = act."p$$assigned_to__oid"::text AND acc.object_idx = 0) AS "p$$assigned_to_title"
FROM kernel_activity act
LEFT JOIN kernel_activity act0 ON act0.object_idx = 0 AND act.object_rid::text = act0.object_rid::text AND act.object_oid::text = act0.object_oid::text
UNION
SELECT 0 AS object_rid, act.object_oid, act.object_idx, act0."p$$sender_parent__rid" AS "p$$object_parent__rid", act0."p$$sender_parent__oid" AS "p$$object_parent__oid", act.created_at, act.modified_at, act.created_by, act.modified_by, act.object__class, act.access_level_browse, act.access_level_update, act.access_level_delete, act."owner", act.name, act.description, act.percent_complete, act.priority, act.due_by, act.scheduled_start, act.scheduled_end, act.actual_start, act.actual_end, act.misc1, act.misc2, act.misc3, act.activity_state, act.activity_number, act.severity, act.assigned_to, act."p$$assigned_to__rid", act."p$$assigned_to__oid", ( SELECT acc.full_name
FROM kernel_account acc
WHERE acc.object_rid::text = act."p$$assigned_to__rid"::text AND acc.object_oid::text = act."p$$assigned_to__oid"::text AND acc.object_idx = 0) AS "p$$assigned_to_title"
FROM kernel_activity act
LEFT JOIN kernel_activity act0 ON act0.object_idx = 0 AND act.object_rid::text = act0.object_rid::text AND act.object_oid::text = act0.object_oid::text)
UNION
SELECT 0 AS object_rid, act.object_oid, act.object_idx, p0."p$$party__rid" AS "p$$object_parent__rid", p0."p$$party__oid" AS "p$$object_parent__oid", act.created_at, act.modified_at, act.created_by, act.modified_by, act.object__class, act.access_level_browse, act.access_level_update, act.access_level_delete, act."owner", act.name, act.description, act.percent_complete, act.priority, act.due_by, act.scheduled_start, act.scheduled_end, act.actual_start, act.actual_end, act.misc1, act.misc2, act.misc3, act.activity_state, act.activity_number, act.severity, act.assigned_to, act."p$$assigned_to__rid", act."p$$assigned_to__oid", ( SELECT acc.full_name
FROM kernel_account acc
WHERE acc.object_rid::text = act."p$$assigned_to__rid"::text AND acc.object_oid::text = act."p$$assigned_to__oid"::text AND acc.object_idx = 0) AS "p$$assigned_to_title"
FROM kernel_activity act
JOIN kernel_activityparty p0 ON p0.object_idx = 0 AND p0."p$$object_parent__rid"::text = act.object_rid::text AND p0."p$$object_parent__oid"::text = act.object_oid::text)
UNION
SELECT 0 AS object_rid, act.object_oid, act.object_idx, p0."p$$party_parent__rid" AS "p$$object_parent__rid", p0."p$$party_parent__oid" AS "p$$object_parent__oid", act.created_at, act.modified_at, act.created_by, act.modified_by, act.object__class, act.access_level_browse, act.access_level_update, act.access_level_delete, act."owner", act.name, act.description, act.percent_complete, act.priority, act.due_by, act.scheduled_start, act.scheduled_end, act.actual_start, act.actual_end, act.misc1, act.misc2, act.misc3, act.activity_state, act.activity_number, act.severity, act.assigned_to, act."p$$assigned_to__rid", act."p$$assigned_to__oid", ( SELECT acc.full_name
FROM kernel_account acc
WHERE acc.object_rid::text = act."p$$assigned_to__rid"::text AND acc.object_oid::text = act."p$$assigned_to__oid"::text AND acc.object_idx = 0) AS "p$$assigned_to_title"
FROM kernel_activity act
JOIN kernel_activityparty p0 ON p0.object_idx = 0 AND p0."p$$object_parent__rid"::text = act.object_rid::text AND p0."p$$object_parent__oid"::text = act.object_oid::text;

That's *horrible*. There's typecasting on the join conditions to convert things to text! If there's a join on it, not only should they already be the same type, there ought to be a foreign key constraint on it (even if it is a self-referencing table). The silly thing UNIONs the exact same query four times in order to create a heirarchy! And it has the gloriously descriptive name 'kernel_view_027' (all 19 views share this naming convention).

While creating the tables, I got three errors about an unknown datatype. Yeah, that's great design.

110 of the 118 tables have these fields:
object_rid varchar(200) NOT NULL,
"p$$object_parent__rid" varchar(200),
object_oid varchar(200) NOT NULL,
"p$$object_parent__oid" varchar(200),
object_idx int4 NOT NULL,
object__class varchar(200),
modified_at varchar(20),
created_at varchar(20),
modified_by varchar(20),
created_by varchar(20),
"owner" varchar(50),
access_level_browse int4,
access_level_update int4,
access_level_delete int4,

And this key:
CONSTRAINT kernel_media_pkey PRIMARY KEY (object_rid, object_oid, object_idx)

Wow, yeah. "modified_at" and "created_at". Those should definitely not be timestamps. "owner". Great field name, that. The only keys that don't allow NULLs are the primaries. Clearly each of the 110 tables will need all these fields, then.

Looking at the indexes, the vast majority of them are against one of these 14 universal fields, but only exist on some of the tables.

It's *badly designed*. I've been at this for less than three months and *I* can see it. They're using a relational database as an object database. No wonder their performance sucks!

Brandon Aiken

________________________________

From: Alexander Staubo [mailto:alex(at)purefiction(dot)net]
Sent: Sat 10/7/2006 5:59 PM
To: Brandon Aiken
Cc: Tomi NA; pgsql-general(at)postgresql(dot)org
Subject: Re: [GENERAL] performace review

On Oct 7, 2006, at 23:44 , Brandon Aiken wrote:

> I download the db source (inside opencrx-1.9.1-
> core.postgresql-8.zip) and executed their three schema files,
> dbcreate-indexes.sql, dbcreate-views.sql, dbcreate-tables.sql.
> Each of the 118 tables has a three-field composite primary key of
> 'PRIMARY KEY (object_rid, object_oid, object_idx)'. object_rid and
> object_oid are both VARCHAR(200). There are *no* foreign key
> constraints. Each table has between 15 and 50 fields, with 25
> looking about average.

To be fair, there are a bunch of indexes, but the number of indexes
seems low compared to the number of fields.

> Gee, why to table joins take so long? Maybe because a blind monkey
> created the schema? Normalized databases do tend to perform
> better, so I hear.

*De*normalization is the traditional hack to speed up queries,
because it reduces the need for joins.

Alexander.

In response to

Browse pgsql-general by date

  From Date Subject
Next Message Chris Browne 2006-10-07 22:41:46 Re: performace review
Previous Message Joshua D. Drake 2006-10-07 22:23:58 Re: performace review