Re: Performance problem with row count trigger

From: Tony Cebzanov <tonyceb(at)andrew(dot)cmu(dot)edu>
To: Greg Sabino Mullane <greg(at)turnstep(dot)com>
Cc: pgsql-sql(at)postgresql(dot)org
Subject: Re: Performance problem with row count trigger
Date: 2009-04-03 18:22:32
Message-ID: 49D653E8.7040008@andrew.cmu.edu
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-sql

Greg Sabino Mullane wrote:
> A few things spring to mind:
>
> 1) Use a separate table, rather than storing things inside of
> dataset itself. This will reduce the activity on the dataset table.

A separate table just for that one column? Would that really help,
given that I'd have to add the foreign key dataset_id to the related
table? How does splitting activity across dataset and, say,
dataset_counts help things?

> 2) Do you really need bigint for the counts?

Probably not. Still, changing to INTEGER hasn't changed the performance
in any measurable way.

> 3) If you do want to do this, you'll need a different approach as
> Tom mentioned. One way to do this is to have a special method for
> bulk loading, that gets around the normal updates and requires that
> the user take responsiblity for knowing when and how to call the
> alternate path. The basic scheme is this:
>
> 1. Disable the normal triggers
> 2. Enable special (perl) triggers that keep the count in memory
> 3. Do the bulk changes
> 4. Enable normal triggers, disable special perl one
> 5. Do other things as needed....
> 6. Commit the changes to the assoc_count field.

I gave this a shot, and my initial testing looks very promising. Using
your scheme, the performance looks to be just as good as without the
trigger.

I haven't done any kind of edge case testing to see if weird things
happen when multiple datasets are added simultaneously, or when someone
inserts an assoc record out-of-band while a bulk dataset load is
happening, but you've certainly got me well on my way to a workable
solution. Many thanks!

There's one part I don't get, though...

> Number 6 can be done anytime, as long as you are in the same session. The danger
> is in leaving the session without calling the final function. This can be
> solved with some deferred FK trickery, or by careful scripting of the events.
> All this doesn't completely remove the pain, but it may shift it around enough
> in useful ways for your app.

I'm not sure I understand the danger you're talking about here. Doesn't
putting the whole start_bulkload_assoc_count(), bulk insert, and
end_bulkload_assoc_count() process in a transaction save me from any of
these problems? Or is there some more subtle problem I'm not seeing?

In response to

Responses

Browse pgsql-sql by date

  From Date Subject
Next Message Peter Willis 2009-04-03 19:10:02 Re: FUNCTION problem
Previous Message Adrian Klaver 2009-04-03 16:59:16 Re: FUNCTION problem