From: | Greg Stark <gsstark(at)mit(dot)edu> |
---|---|
To: | pgsql-general(at)postgresql(dot)org |
Subject: | Re: how many record versions |
Date: | 2004-05-23 21:36:07 |
Message-ID: | 87vfimq114.fsf@stark.xeocode.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-general |
David Garamond <lists(at)zara(dot)6(dot)isreserved(dot)com> writes:
> Actually, each record will be incremented probably only thousands of times a
> day. But there are many banners. Each record has a (bannerid, campaignid,
> websiteid, date, countrycode) "dimensions" and (impression, click) "measures".
In the past when I had a very similar situation we kept the raw impression and
click event data. Ie, one record per impression in the impression table and
one record per click in the click data.
That makes the tables insert-only which is efficient and not prone to locking
contention. They would never have to be vacuumed except after purging old data.
Then to accelerate queries we had denormalized aggregate tables with a cron
job that did the equivalent of
insert into agg_clicks (
select count(*),bannerid
from clicks
where date between ? and ?
group by bannerid
)
Where the ?s were actually hourly periods. Ie, at 12:15 it ran this query for
the 11-12 period.
This meant we didn't have immediate up-to-date stats on banners but it meant
we did have stats on every single impression and click including time and
information about the users.
This worked out very well for reporting needs. If your system is using the
data to handle serving the ads, though, it's a different kettle of fish. For
that I think you'll want something that avoids having to do a database query
for every single impression.
--
greg
From | Date | Subject | |
---|---|---|---|
Next Message | Paul Thomas | 2004-05-23 22:39:19 | Re: pg_dump error |
Previous Message | Tom Lane | 2004-05-23 20:47:38 | Re: how many record versions |