Re: Bit count

From: Jeff Davis <pgsql(at)j-davis(dot)com>
To: pablo platt <pablo(dot)platt(at)gmail(dot)com>
Cc: pgsql-novice(at)postgresql(dot)org
Subject: Re: Bit count
Date: 2013-09-02 16:24:15
Message-ID: 1378139055.21540.51.camel@jdavis
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-novice

On Thu, 2013-08-22 at 19:26 +0300, pablo platt wrote:

> I'll use a bit varying field with unlimited length.
>
> To record unique visitors per day, I'll flip the bit corresponding to
> a user's id.
>
> Each event type will have a separate record.
> It is possible to get useful information using union(&) and
> intersection(|) of several fields.
>
> A field for 1 M users will required 1M bits = 125KB.

Even though you're modifying only a single bit at a time, every update
will consume significant overhead. It will also be fairly hard to query
and require procedural functions like you have.

One thing I would consider is using a "raw events" table for the
incoming data, and then periodically summarize it into another table and
delete the raw data after you summarize it.

Something like:

create table event_raw(ts timestamptz, user_id integer);
create table event_summary(day date, unique_users bigint);

And after a day has passed do:

insert into event_summary(day, unique_users)
select day, count(*)
from
(select distinct ts::date as day, user_id
from event_raw) r
where day < current_date
group by day;
delete from event_raw where ts::date < current_date;

That will make it easier to query. Remember that if you need to include
the current data in a report, you need to do a UNION (and probably make
a view so that it's easier).

If you want to be a little more efficient, you can use a common table
expression to do the delete and insert in one step:

with d as (
delete from event_raw where ts::date < current_date
returning ts, user_id
)
insert into event_summary(day, unique_users)
select day, count(*)
from
(select distinct ts::date as day, user_id from d) r
where day < current_date
group by day;

Also, if you need to do hourly summaries instead of daily, then use
date_trunc() rather than just casting to date.

I hope this helps,
Jeff Davis

In response to

  • Bit count at 2013-08-22 16:26:38 from pablo platt

Responses

Browse pgsql-novice by date

  From Date Subject
Next Message DrYSG 2013-09-02 16:29:25 PostGreSQL and NodeJS
Previous Message Tom Lane 2013-09-02 14:40:56 Re: extension built-in but now showing in dx/dx+