From: | Robert James <srobertjames(at)gmail(dot)com> |
---|---|
To: | Elliot <yields(dot)falsehood(at)gmail(dot)com> |
Cc: | Postgres General <pgsql-general(at)postgresql(dot)org> |
Subject: | Re: Count of records in a row |
Date: | 2013-10-25 16:43:20 |
Message-ID: | CAGYyBggJjXfDUQv10D=PV8PHKF7Ovff5+fh6w0UhQ7AQRHuzig@mail.gmail.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-general |
Ingenious!
I actually think, however, there was a subtle bug in, though I see you fixed it.
The line:
- row_number() over () as d
needs to be:
- row_number() over (order by i asc) as d
I discovered this when working your code into my application. I got
very, very weird results - with one order of columns in the select, I
got the correct answer, but with another one I didn't. After much
debugging, I realized that the original version ("- row_number over
()") wasn't defined! So, depending on how I wrote the select
statement, Postgres could pick different orders!
But I see your cleaned up version already fixed this!
On 10/25/13, Elliot <yields(dot)falsehood(at)gmail(dot)com> wrote:
> Glad I could help. It's easier to understand if you break apart the CTE.
> I'm also moving around the order by i to clean this up a little. Sorry
> for the formatting.
>
> Running this:
> select i,
> val,
> row_number() over (partition by val order by i asc) as class_i,
> row_number() over (order by i asc) as overall_i,
> row_number() over (partition by val order by i asc)
> - row_number() over () as d
> from data
>
> Yields this:
> i val class_i overall_i d
> 1 A 1 1 0
> 2 A 2 2 0
> 3 A 3 3 0
> 4 B 1 4 -3
> 5 C 1 5 -4
> 6 A 4 6 -2
> 7 D 1 7 -6
> 8 A 5 8 -3
> 9 A 6 9 -3
> 10 D 2 10 -8
> 11 D 3 11 -8
> 12 B 2 12 -10
> 13 C 2 13 -11
> 14 C 3 14 -11
>
> class_i counts the row number within a class and overall_i counts the
> overall row number in the sequence. Here's just one class extracted to
> emphasize that:
>
> i val class_i overall_i d
> 1 A 1 1 0
> 2 A 2 2 0
> 3 A 3 3 0
> 6 A 4 6 -2
> 8 A 5 8 -3
> 9 A 6 9 -3
>
> Within a given consecutive run of a particular class the difference
> between class_i and overall_i will always be the same (because they're
> both increasing by the same amount) but that difference between runs
> will always be different (because each run starts the sequences at
> different offsets). "d" is the difference of the two. Because that value
> segments the runs, all that needs to be done is group by it and count
> the items in the group to get the length of the runs.
>
> The xxx column was junk left over from copying and pasting and
> verifying. Apologies :). This is a cleaned up version:
>
> with x
> as
> (
> select i,
> val,
> row_number() over (partition by val order by i asc)
> - row_number() over (order by i asc) as d
> from data
> )
> select val,
> count(*)
> from x
> group by d,
> val
> order by min(i)
> ;
>
>
From | Date | Subject | |
---|---|---|---|
Next Message | Robert James | 2013-10-25 16:55:08 | Detecting change in event properties |
Previous Message | Rahila Syed | 2013-10-25 14:29:34 | Increasing CPU usage of PostgreSQL |