grouping consecutive records

From: Morus Walter <morus(dot)walter(dot)ml(at)googlemail(dot)com>
To: pgsql-general(at)postgresql(dot)org
Subject: grouping consecutive records
Date: 2013-02-04 09:34:54
Message-ID: 20130204103454.0b3c6b23@tucholsky.experteer.muc
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-general

Hallo,

I have a question regarding a selection.

I'd like to group and merge certain records having the same values in
some columns, but only if they are contiguous with regard to some sort
order.

So for a table
create table foo (
id int,
user_id int,
key varchar,
sort int );

and values e.g.
insert into foo
values ( 1, 1, 'foo', 1),
( 2, 1, 'foo', 2),
( 3, 1, 'bar', 3),
( 4, 1, 'foo', 4),
( 5, 1, 'foo', 5),
( 6, 1, 'foo', 6),
( 7, 1, 'bla', 7),
( 8, 2, 'bar', 1),
( 9, 2, 'foo', 2),
(10, 2, 'foo', 3),
(11, 2, 'bla', 4);

I'd like to merge all consecutive records (ordered by sort, user_id)
having the same value in user_id and key and keep the first/last
value of sort of the merged records (and probably some more values
from the first or last merged record).

So the result should be something like
user_id, key, sort_first, sort_last
1, 'foo', 1, 2
1, 'bar', 3, 3
1, 'foo', 4, 6
1, 'bla', 7, 7
2, 'bar', 1, 1
2, 'foo', 2, 3
2, 'bla', 4, 4

I was trying to do that using window functions, which works great -
except it merges non consecutive occurences (key foo for user_id 1 in
my sample) as well.

select user_id, key, sort_first, sort_last
from (
select user_id,
key,
first_value(sort) over w as sort_first,
last_value(sort) over w as sort_last,
lead(key) over w as next_key
from foo
window w as (partition by user_id, key order by sort
range between unbounded preceding and unbounded following)
) as foo
where next_key is null
order by user_id, sort_first;

user_id | key | sort_first | sort_last
---------+-----+------------+-----------
1 | foo | 1 | 6 <-- would like to have two records
1/2 and 4/6 here
1 | bar | 3 | 3
1 | bla | 7 | 7
2 | bar | 1 | 1
2 | foo | 2 | 3
2 | bla | 4 | 4

Introducing another window on user_id only allows me to keep two records for
1/foo but I still cannot determine the intended sort_first/sort_last.

select user_id, key, sort_first, sort_last
from (
select user_id,
key,
first_value(sort) over w as sort_first,
last_value(sort) over w as sort_last,
lead(key) over u as next_key
from foo
window u as (partition by user_id order by sort),
w as (partition by user_id, key order by sort
range between unbounded preceding and unbounded following)
) as foo
where next_key is null or key != next_key
order by user_id, sort_first;

user_id | key | sort_first | sort_last
---------+-----+------------+-----------
1 | foo | 1 | 6
1 | foo | 1 | 6
1 | bar | 3 | 3
1 | bla | 7 | 7
2 | bar | 1 | 1
2 | foo | 2 | 3
2 | bla | 4 | 4

So the question is: is this doable with a selection?
Can I use window functions for this type of grouping?
Are there other options?

I do have an alternative plan to select records into a temporary table first,
and then do updates merging two consecutive records and repeat that until
all groups are completely merged, but I'd still like to know if I miss
something regarding selection options.

best
Morus

PS: the alternative plan is something like

select id, user_id,
key,
sort,
sort as sort_last,
lead(key) over u as next_key,
lead(id) over u as next_id,
lag(key) over u as prev_key
into temp table footmp
from foo
window u as (partition by user_id order by sort);

update footmp set sort = f2.sort, prev_key = f2.prev_key
from footmp f2
where footmp.id = f2.next_id and
footmp.key = f2.key and
f2.key = f2.next_key and
( f2.prev_key is null or f2.prev_key != f2.key );

delete from footmp
where id in (
select id from (
select first_value(id) over w as id,
count(*) over w as cnt
from footmp
window w as ( partition by user_id, sort )
) as foo where cnt > 1
);

(repeat update/delete until no row is affected)

select user_id,
key,
sort as sort_first,
sort_last
from footmp
order by user_id, sort_first;

pretty ugly and complicated but at least gives me what I want...

Responses

Browse pgsql-general by date

  From Date Subject
Next Message zeljko 2013-02-04 09:35:37 Re: Diferences between IN and EXISTS?
Previous Message wd 2013-02-04 08:35:51 Re: Weird explain output