Quick Links

Re: Approach to Data Summary and Analysis

From:	Roxanne Reid-Bennett <rox(at)tara-lu(dot)com>
To:
Cc:	"pgsql-general(at)postgresql(dot)org" <pgsql-general(at)postgresql(dot)org>
Subject:	Re: Approach to Data Summary and Analysis
Date:	2014-04-18 05:46:36
Message-ID:	5350BC3C.5030700@tara-lu.com
Views:	Whole Thread \| Raw Message \| Download mbox \| Resend email
Thread:
Lists:	pgsql-general pgsql-performance

On 4/16/2014 2:40 PM, Robert DiFalco wrote:
> Thanks Roxanne, I suppose when it comes down to it -- for the current
> use cases and data size -- my only concern is the "calling" query that
> will need to use max to determine if a user has already had a call
> today. For a large user set, for each user I would either have to MAX
> on the answered timestamp to compare it against today or do an exist
> query to see if any timestamp for that user is greater or equal than
> "today".

I didn't go back to look at your original schema- but.. if your 500K
records are coming in time ordered... You may be able to track "max" as
an attribute on an "SCD" based on the caller/callee table [or the
caller/ee table itself if that table is only used by your app] with an
update from a post-insert trigger on the appropriate table. Even if they
aren't time ordered, you add the overhead of a single comparative in the
trigger. Downside is that you fire a trigger and an update for every
insert. [or just an update depending on what is driving your load of the
500K records]

Again - the proof on "value" of this overhead is a comparison of the
cost for the updates vs the cost on the query to find max() I suspect
your once a day query can afford all sorts of other optimizations that
are "better" than a trigger fired on every insert. [such as the
function index - that was already mentioned] I really suspect you just
don't have enough load on the query side (complex queries * # of users)
to justify the extra load on the write side (+1 trigger, +1 update /
insert) to avoid a (potentially) heavy query load 1x/day.

Another option... if only worried about "today".. then keep only
"today's" data in your query table, and migrate historical data nightly
to a pseudo archive table for those "every once in a while" questions.
I haven't played with table inheritance in Postgres - but that's a
capability I might look at if I were doing a pseudo archive table.

Roxanne

In response to

Re: Approach to Data Summary and Analysis at 2014-04-16 18:40:06 from Robert DiFalco

Browse pgsql-general by date

	From	Date	Subject
Next Message	Albe Laurenz	2014-04-18 07:42:06	Re: [GENERAL] client encoding that psql command sets
Previous Message	Gavin Flower	2014-04-18 05:04:53	Re: Arduino SQL Connector

Browse pgsql-performance by date

	From	Date	Subject
Next Message	Franck Routier	2014-04-18 07:22:56	Re: Fast distinct not working as expected
Previous Message	Jeff Janes	2014-04-17 18:17:28	Re: Fast distinct not working as expected