Re: how to effectively SELECT new "customers"

From: Jan Ostrochovsky <ostrochovsky(at)mobiletech(dot)sk>
To: David Johnston <polobo(at)yahoo(dot)com>
Cc: pgsql-sql(at)postgresql(dot)org
Subject: Re: how to effectively SELECT new "customers"
Date: 2014-02-28 11:45:26
Message-ID: 976203811.595346.1393587926835.JavaMail.root@mobiletech.sk
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-sql

----- Original Message -----

> From: "David Johnston" <polobo(at)yahoo(dot)com>
> To: pgsql-sql(at)postgresql(dot)org
> Sent: Thursday, February 27, 2014 4:09:34 PM
> Subject: Re: [SQL] how to effectively SELECT new "customers"

> Jan Ostrochovsky wrote
> > Hello, I am solving following task and it seems hard to me to find
> > effective solution. Maybe somebody knows how to help me: We have
> > table
> > "purchases" and each one record is identified by "customer_id". We
> > want to
> > know not only how many different customers did at least one
> > purchase per
> > time period, grouped by time periods (easy task: "COUNT(DISTINCT
> > customer_id)" with "GROUP BY period"), but also to know how many
> > NEW
> > customers there were. We define new customer as customer_id, which
> > had
> > first record in table "purchases" after 12 month of inactivity (no
> > record
> > in table "purchases" previous 12 months). I have found one
> > solution, but
> > it is very slow and ugly. I tried several other concepts, but
> > without
> > success. Any hint could be helpful for me. Thanks in advance! Jano

> Without incorporating additional meta-data about the purchases onto
> the
> customer table the most basic solution would be:

> SELECT DISTINCT customer_id FROM products WHERE date > (now() - '12
> months'::interval)
> EXCEPT
> SELECT DISTINCT customer_id FROM products WHERE date <= (now() - '12
> months'::interval)

> ---

> Another solution:
> WHERE ... >12 AND NOT EXISTS (SELECT ... WHERE <= 12)

> ---

> Depending on the frequency that you need to run this query it may be
> worthwhile to create a materialized view that captures the necessary
> data
> and then whenever a new sale is generated you simply update that view
> by
> changing the attributes of that single customer. At any point you can
> quickly determine, using the view, which customers were active at the
> time
> of last purchases and which ones were dormant or non-existent.

> David J.

Thank you David, your alternatives seem as smart alternatives of my solution, what is based on the difference between COUNT(DISTINCT customer_id) in e.g. 13 months (in case when reporting period is 1 month over previous 12 months) and COUNT(DISTINCT customer_id) in 12 months prior to reporting period (but without the reporting period). But I would also like to incorporate other data in the output, which should seem e.g.:

town, period, payment_channel_id, purchases_count, purchases_total_revenue, distinct_customer_count, distinct_NEW_customer_count
BA, 201401, A, 495, 847, 103, 25
BA, 201401, B, 345, 456, 99, 21
BA, 201312, A, 554, 1021, 105, 30
ZA, 201401, A, 323, 987, 95, 23
...

there is 12-month-long time-window for new customer determination, sliding according to particular period, back to the past

how to achieve this? e.g. by running your or my concept repeatedly (manually or in the higher level loop), but this is too slow... or to find more sophisticated SQL query to accomplish this in one step... any other ideas for such SQL query running once, not repeatedly (in loop) for each reporting period? (I was considering window functions, connectby, grouping sets, ..., but no success yet)

materialized view could be part of that solution, when we will have the right query and uprgade from 9.0 to 9.3 (as I see it: http://www.postgresql.org/docs/9.3/static/rules-materializedviews.html)... but it should be scalable to several to tens purchases every minute

thanks again for your contribution, David

Jano

In response to

Responses

Browse pgsql-sql by date

  From Date Subject
Next Message ALMA TAHIR 2014-02-28 13:13:10 Re: Function Issue
Previous Message Jan Ostrochovsky 2014-02-28 11:44:12 Re: how to effectively SELECT new "customers"