Re: Bit count

From: pablo platt <pablo(dot)platt(at)gmail(dot)com>
To: Jeff Davis <pgsql(at)j-davis(dot)com>
Cc: pgsql-novice(at)postgresql(dot)org
Subject: Re: Bit count
Date: 2013-09-03 11:41:04
Message-ID: CANdLC8Uc4czNuEpCxKiHSsZE8U1hnD-34jKF=Xki=DjX9M7HBA@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-novice

I'll try it.

Thank you.

On Mon, Sep 2, 2013 at 7:24 PM, Jeff Davis <pgsql(at)j-davis(dot)com> wrote:

> On Thu, 2013-08-22 at 19:26 +0300, pablo platt wrote:
>
> > I'll use a bit varying field with unlimited length.
> >
> > To record unique visitors per day, I'll flip the bit corresponding to
> > a user's id.
> >
> > Each event type will have a separate record.
> > It is possible to get useful information using union(&) and
> > intersection(|) of several fields.
> >
> > A field for 1 M users will required 1M bits = 125KB.
>
> Even though you're modifying only a single bit at a time, every update
> will consume significant overhead. It will also be fairly hard to query
> and require procedural functions like you have.
>
> One thing I would consider is using a "raw events" table for the
> incoming data, and then periodically summarize it into another table and
> delete the raw data after you summarize it.
>
> Something like:
>
> create table event_raw(ts timestamptz, user_id integer);
> create table event_summary(day date, unique_users bigint);
>
> And after a day has passed do:
>
> insert into event_summary(day, unique_users)
> select day, count(*)
> from
> (select distinct ts::date as day, user_id
> from event_raw) r
> where day < current_date
> group by day;
> delete from event_raw where ts::date < current_date;
>
> That will make it easier to query. Remember that if you need to include
> the current data in a report, you need to do a UNION (and probably make
> a view so that it's easier).
>
> If you want to be a little more efficient, you can use a common table
> expression to do the delete and insert in one step:
>
> with d as (
> delete from event_raw where ts::date < current_date
> returning ts, user_id
> )
> insert into event_summary(day, unique_users)
> select day, count(*)
> from
> (select distinct ts::date as day, user_id from d) r
> where day < current_date
> group by day;
>
> Also, if you need to do hourly summaries instead of daily, then use
> date_trunc() rather than just casting to date.
>
> I hope this helps,
> Jeff Davis
>
>
>
>

In response to

Browse pgsql-novice by date

  From Date Subject
Next Message James David Smith 2013-09-04 11:40:29 Writing results while loop ongoing?
Previous Message Luca Ferrari 2013-09-03 11:36:03 Re: Duplicating a table with a trigger