From: | Tim Landscheidt <tim(at)tim-landscheidt(dot)de> |
---|---|
To: | pgsql-general(at)postgresql(dot)org |
Subject: | Re: Query to find contiguous ranges on a column |
Date: | 2009-10-13 22:12:58 |
Message-ID: | m3r5t754et.fsf@passepartout.tim-landscheidt.de |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-general |
Peter Hunsberger <peter(dot)hunsberger(at)gmail(dot)com> wrote:
> [...]
> I have one solution that joins the table against itself and does
> (among other things) a subselect looking "not exists col +1" and "not
> exists col -1" on the two instances of the table to find the start and
> end. This is, as you might guess, is not very efficient (my actual
> data is some 6 million+ rows) and I'm guessing there has to be
> something more efficient with windowing or possibly grouping on min
> and max (though I can't see how to make sure they are part of a
> contiguous set). Anyone have any ideas?
You can either use a PL/pgSQL function ("SETOF TEXT" just
for the convenience of the example):
| CREATE FUNCTION SummarizeRanges () RETURNS SETOF TEXT AS $$
| DECLARE
| CurrentFirst INT;
| CurrentLast INT;
| CurrentRecord RECORD;
| BEGIN
| FOR CurrentRecord IN SELECT col FROM t ORDER BY col LOOP
| IF CurrentFirst IS NULL THEN
| CurrentFirst := CurrentRecord.col;
| CurrentLast := CurrentRecord.col;
| ELSIF CurrentRecord.col = CurrentLast + 1 THEN
| CurrentLast := CurrentRecord.col;
| ELSE
| RETURN NEXT CurrentFirst || ', ' || CurrentLast;
| CurrentFirst := CurrentRecord.col;
| CurrentLast := CurrentRecord.col;
| END IF;
| END LOOP;
| IF CurrentFirst IS NOT NULL THEN
| RETURN NEXT CurrentFirst || ', ' || CurrentLast;
| END IF;
| RETURN;
| END;
| $$ LANGUAGE plpgsql;
or a recursive query (which I always find very hard to com-
prehend):
| WITH RECURSIVE RecCols (LeftBoundary, Value) AS
| (SELECT col, col FROM t WHERE (col - 1) NOT IN (SELECT col FROM t)
| UNION ALL SELECT p.LeftBoundary, c.col FROM RecCols AS p, t AS c WHERE c.col = p.Value + 1)
| SELECT LeftBoundary, MAX(Value) AS RightBoundary FROM RecCols
| GROUP BY LeftBoundary
| ORDER BY LeftBoundary;
Could you run both against your data set and find out which
one is faster for your six million rows?
Tim
From | Date | Subject | |
---|---|---|---|
Next Message | Tim Landscheidt | 2009-10-13 22:22:04 | Re: Procedure for feature requests? |
Previous Message | Greg Smith | 2009-10-13 21:44:38 | Re: Re: [GENERAL] contrib/plantuner - enable PostgreSQL planner hints |