| From: | Виктор Егоров <vyegorov(at)gmail(dot)com> | 
|---|---|
| To: | Morus Walter <morus(dot)walter(dot)ml(at)googlemail(dot)com> | 
| Cc: | pgsql-general(at)postgresql(dot)org | 
| Subject: | Re: grouping consecutive records | 
| Date: | 2013-02-04 10:14:06 | 
| Message-ID: | CAGnEbohhKmW55oB0FpQd3naXBkwi74E=8DRZBGFjS-MeGpXLaA@mail.gmail.com | 
| Views: | Whole Thread | Raw Message | Download mbox | Resend email | 
| Thread: | |
| Lists: | pgsql-general | 
2013/2/4 Morus Walter <morus(dot)walter(dot)ml(at)googlemail(dot)com>:
> I'd like to merge all consecutive records (ordered by sort, user_id)
> having the same value in user_id and key and keep the first/last
> value of sort of the merged records (and probably some more values
> from the first or last merged record).
>
> So the result should be something like
> user_id, key, sort_first, sort_last
> 1, 'foo', 1, 2
> 1, 'bar', 3, 3
> 1, 'foo', 4, 6
> 1, 'bla', 7, 7
> 2, 'bar', 1, 1
> 2, 'foo', 2, 3
> 2, 'bla', 4, 4
This example corresponds to the ORDER BY user_id, sort
while you claim you need to ORDER BY sort, user_id.
I will explain this for the ordering that matches your sample.
You need to group your data, but you should first create an artificial
grouping column.
First, detect ranges of your buckets:
WITH ranges AS (
    SELECT id, user_id, key, sort,
           CASE WHEN lag(key) OVER
                    (PARTITION BY user_id ORDER BY user_id, sort) = key
                THEN NULL ELSE 1 END r
      FROM foo
)
SELECT * FROM ranges;
Here each time a new “range” is found, «r» is 1, otherwise it is NULL.
Now, form your grouping column:
WITH ranges AS (
    SELECT id, user_id, key, sort,
           CASE WHEN lag(key) OVER
                    (PARTITION BY user_id ORDER BY user_id, sort) = key
                THEN NULL ELSE 1 END r
      FROM foo
)
, groups AS (
    SELECT id, user_id, key, sort, r,
           sum(r) OVER (ORDER BY user_id, sort) grp
      FROM ranges
)
SELECT * FROM groups;
Here sum() is used as running total to produce new “grp” values.
Final query looks like this:
WITH ranges AS (
    SELECT id, user_id, key, sort,
           CASE WHEN lag(key) OVER
                    (PARTITION BY user_id ORDER BY user_id, sort) = key
                THEN NULL ELSE 1 END r
      FROM foo
)
, groups AS (
    SELECT id, user_id, key, sort, r,
           sum(r) OVER (ORDER BY user_id, sort) grp
      FROM ranges
)
SELECT min(user_id) user_id, min(key) "key",
       min(sort) sort_first,
       max(sort) sort_last
  FROM groups
 GROUP BY grp
 ORDER BY user_id,sort_first;
Based on this SO answer: http://stackoverflow.com/a/10624628/1154462
-- 
Victor Y. Yegorov
| From | Date | Subject | |
|---|---|---|---|
| Next Message | Morus Walter | 2013-02-04 10:46:41 | Re: grouping consecutive records | 
| Previous Message | Thomas Kellerer | 2013-02-04 10:13:10 | Re: Diferences between IN and EXISTS? |