Re: Q:Aggregrating Weekly Production Data. How do you do it?

From: Ow Mun Heng <Ow(dot)Mun(dot)Heng(at)wdc(dot)com>
To: Scott Marlowe <scott(dot)marlowe(at)gmail(dot)com>
Cc: pgsql-general(at)postgresql(dot)org
Subject: Re: Q:Aggregrating Weekly Production Data. How do you do it?
Date: 2007-09-19 03:32:21
Message-ID: 1190172741.31628.43.camel@neuromancer.home.net
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-general

On Tue, 2007-09-18 at 09:55 -0500, Scott Marlowe wrote:
> On 9/17/07, Ow Mun Heng <Ow(dot)Mun(dot)Heng(at)wdc(dot)com> wrote:
>
> > Just wondering how everyone is doing aggregration of production data.
>
> Where I work, we aggregate by the minute in the application, then dump
> those stats entries into the database en masse from each machine. So,
[snip]
> The fail1/2/3 are types of failures, the totalresptime
> time is all the successful requests added together, so that
> totalresptime/success = average for that minute, and max is the
> longest running request ending in that minute.

So, it's basicaly a summation rather than using avg() all the way.
That would mean using something like bigint or something to deal with
the large numbers
>
> > (getting min/max/count isn't much of an issue. Stdev is the main issue I
> > believe)
> >
> > One such instance I've read about is..
>
> Isn't stddev() working for you?

Stdev is based on a population of data. if I do slices and if I want to
stdev across 2 months (8 weeks eg) then it would be wrong.

>
> What I do is aggregate the minute time slices by grouping by
> date_trunc('xxx',timestamp) and then use that as a subselect to a
> query that does the stddev() outside of that. works pretty well for
> us, and makes it easy to identify trends.

Trending analysis is very critical and the only reference I've found on
how to get "stdev" is based on what I posted.

> One of the things we started doing is to aggregate the last six weeks
> data by the day / hour and then comparing the last 24 hours worth of
> data to those six weeks worth to see trends for each hour of each day.
> The queries are huge and ugly, but they work, and run in about 4
> minutes on a fairly good sized chunk of data. We have about 150k to
> 300k entries a day put into this db.

I'm not sure how many K entries in a day(yet to count it) but I'm
getting into trouble w/ one-2-many relationships and PG is choosing to
do nested loops etc. (lots of left joins on same tables)

So, I've to resort to using SRF and function scans. a >2 hour Complex
query (on my laptopn,PG, runs in 20min on high end mssql) is reduced to
~60secs.

Thanks for the response.

BTW, are user queries an issue? Does it interfere with the loading.
(That's one of my concerns over here). I noticed that you do the calc
at the app before sending it to the DB, which is a good thing and every
minute too. (is it adviseable to do it in chunks of 1min? less data per
minute vs per 30min/1 hour )

In response to

Responses

Browse pgsql-general by date

  From Date Subject
Next Message Mikko Partio 2007-09-19 04:19:08 Re: Problem dropping table
Previous Message MargaretGillon 2007-09-18 22:06:32 Uninstall mess Postgresql 7.3.4, 8.1.4 on Redhat 9