Re: New system recommendations

From: "Benjamin Krajmalnik" <kraj(at)illumen(dot)com>
To: "Scott Marlowe" <smarlowe(at)g2switchworks(dot)com>
Cc: "Ben K(dot)" <bkim(at)coe(dot)tamu(dot)edu>, <pgsql-admin(at)postgresql(dot)org>
Subject: Re: New system recommendations
Date: 2006-04-28 15:53:14
Message-ID: 8511B4970E0D124898E973DF496F9B432515E1@stash.stackdump.local
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-admin

Concerning the aggregation, no - I am not running aggregate functions on
the database itself.

Let's take a small example. Let's say I am aggregating avg ping time on
an hourly basis.

I have, inside the table declarations, structures with an array[24].

As data comes in, I retrieve the test record for the particular test,
based on the test time I establish which array offset needs to be taken
care of, and then perform the relevant computatis.

For example, I have a testcounter array and a testresult array. When a
new event comes in, average is going to be (testresult[n] *
testcounter[n] + newvalue) / testcounter[n] + 1, followed by a
testcounter[n] += 1. So I am not loading the database doing aggregate
functions.

On our UI side of things, now that everything is working properly, I am
also optimizing code. The current test code was looping through an
array elemnt and aggregating through the backend. Of course, this is
inefficient since, for example, let's say I am showing an hourly graph,
now I have 24 selects going on. Right now we are recoding to have one
select which will return the aggregate of all of the offsets at once.
It would be great if we could aggregate an array in one hot, but in the
meantime we'll do it this way.

________________________________

From: Scott Marlowe [mailto:smarlowe(at)g2switchworks(dot)com]
Sent: Friday, April 28, 2006 9:24 AM
To: Benjamin Krajmalnik
Cc: Ben K.; pgsql-admin(at)postgresql(dot)org
Subject: Re: [ADMIN] New system recommendations

I'm putting both the private email and the thread back on the list, as
there's interesting data in here people could use. I don't think I'm
betraying any trust here, but if I am, please, let me know...

On Thu, 2006-04-27 at 17:37, Benjamin Krajmalnik wrote:
> Thanks for the feedback.
> I wonder if the Dell backplane will take a stabdard RAID controller
> instead of using the ROMB.
> I may investigate just getting 146GB drives for the DL360 and using
that
> instead, and maybe setting up the dell as a backup server, creating
> backups and moving the files over.

If you've got some spare smaller drives laying about, you can always set
up both and benchmark them against each other, plus that gives you a
ready to go backup machine.

> Quick question - I do not have a transaction frame explicitly declared
> in m stored procedure (which is used to handle all of the inserts).
> In plpgsql, how would I wrap the code inside a transaction frame?

All stored procs run as a single transaction. I.e. the simple act of
making it a user defined function has made it a single transaction, with
all the benefits that entails.

> This is such a dilemma - we are launching a new service, but until we
> sign up clients on it I don't have the budget to go crazy on hardware.
> I must make do with what I have. What I was thinking was mybe setting
> up the Dell to be the web server and the backup database server, and
> having the production database run on the DL 360, provided I can get
> better database performance. If I can get significant better
> performance on FreeBSD I will go to it, otherwise I will stay with
> Windows.

I'd test the two. I'm guessing that at least for the initial ramp up,
windows will be ok. The real performance issue for windows is opening
connections is much slower. If the majority of your machine's time is
spent on the query, the overhead of opening connections will be lost in
the noise.

> The stored procedure is a plpgsql function which gets passed
parameters
> from the monitoring agent.
> It then dynamically creates a device record for the monitored device
if
> one does not yet exist.
> Once that is done it creates a test record for the particular test if
> one does not exist.

Up to now, fine. Performance on any machine should be ok.

> Once that is done, it aggregates dynamically the data into 4 tables -
a
> daily snapshot, a weekly snapshot, and a monthly snapshot, and a
> dashboard snapshot.

Are you running aggregate functions against the whole table to do this?
If so, this isn't going to scale. If you're just taking the data
entered in this stored proc and adding it to a table that contains that
data, then maybe it's ok. But if it's updating based on a huge amount
of data, then that's going to be slow.

> Once all of that has been accomplished, it creates a raw log entry
> (which as of next week will go to a given partition). This data goes
> into the partitioned table to facilitate purging of retention periods
> without hammering at the database and having to rebalance indices (I
> just truncate the partition once it is no longer needed).

Sounds reasonable.

Browse pgsql-admin by date

  From Date Subject
Next Message Steve Burrows 2006-04-28 15:57:57 Backing up large databases
Previous Message Tom Lane 2006-04-28 15:48:34 Re: WAL recovery question - 0000001.history