Quick Links

Re: Q:Aggregrating Weekly Production Data. How do you do it?

From:	Ow Mun Heng <Ow(dot)Mun(dot)Heng(at)wdc(dot)com>
To:	Scott Marlowe <scott(dot)marlowe(at)gmail(dot)com>
Cc:	pgsql-general(at)postgresql(dot)org
Subject:	Re: Q:Aggregrating Weekly Production Data. How do you do it?
Date:	2007-09-19 03:32:21
Message-ID:	1190172741.31628.43.camel@neuromancer.home.net
Views:	Whole Thread \| Raw Message \| Download mbox \| Resend email
Thread:
Lists:	pgsql-general

On Tue, 2007-09-18 at 09:55 -0500, Scott Marlowe wrote:
> On 9/17/07, Ow Mun Heng <Ow(dot)Mun(dot)Heng(at)wdc(dot)com> wrote:
>
> > Just wondering how everyone is doing aggregration of production data.
>
> Where I work, we aggregate by the minute in the application, then dump
> those stats entries into the database en masse from each machine. So,
[snip]
> The fail1/2/3 are types of failures, the totalresptime
> time is all the successful requests added together, so that
> totalresptime/success = average for that minute, and max is the
> longest running request ending in that minute.

So, it's basicaly a summation rather than using avg() all the way.
That would mean using something like bigint or something to deal with
the large numbers
>
> > (getting min/max/count isn't much of an issue. Stdev is the main issue I
> > believe)
> >
> > One such instance I've read about is..
>
> Isn't stddev() working for you?

Stdev is based on a population of data. if I do slices and if I want to
stdev across 2 months (8 weeks eg) then it would be wrong.

>
> What I do is aggregate the minute time slices by grouping by
> date_trunc('xxx',timestamp) and then use that as a subselect to a
> query that does the stddev() outside of that. works pretty well for
> us, and makes it easy to identify trends.

Trending analysis is very critical and the only reference I've found on
how to get "stdev" is based on what I posted.

> One of the things we started doing is to aggregate the last six weeks
> data by the day / hour and then comparing the last 24 hours worth of
> data to those six weeks worth to see trends for each hour of each day.
> The queries are huge and ugly, but they work, and run in about 4
> minutes on a fairly good sized chunk of data. We have about 150k to
> 300k entries a day put into this db.

I'm not sure how many K entries in a day(yet to count it) but I'm
getting into trouble w/ one-2-many relationships and PG is choosing to
do nested loops etc. (lots of left joins on same tables)

So, I've to resort to using SRF and function scans. a >2 hour Complex
query (on my laptopn,PG, runs in 20min on high end mssql) is reduced to
~60secs.

Thanks for the response.

BTW, are user queries an issue? Does it interfere with the loading.
(That's one of my concerns over here). I noticed that you do the calc
at the app before sending it to the DB, which is a good thing and every
minute too. (is it adviseable to do it in chunks of 1min? less data per
minute vs per 30min/1 hour )

In response to

Re: Q:Aggregrating Weekly Production Data. How do you do it? at 2007-09-18 14:55:42 from Scott Marlowe

Responses

Re: Q:Aggregrating Weekly Production Data. How do you do it? at 2007-09-19 15:06:36 from Scott Marlowe

Browse pgsql-general by date

	From	Date	Subject
Next Message	Mikko Partio	2007-09-19 04:19:08	Re: Problem dropping table
Previous Message	MargaretGillon	2007-09-18 22:06:32	Uninstall mess Postgresql 7.3.4, 8.1.4 on Redhat 9