Re: Count of records in a row

From: Robert James <srobertjames(at)gmail(dot)com>
To: Elliot <yields(dot)falsehood(at)gmail(dot)com>
Cc: Postgres General <pgsql-general(at)postgresql(dot)org>
Subject: Re: Count of records in a row
Date: 2013-10-25 16:43:20
Message-ID: CAGYyBggJjXfDUQv10D=PV8PHKF7Ovff5+fh6w0UhQ7AQRHuzig@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-general

Ingenious!

I actually think, however, there was a subtle bug in, though I see you fixed it.

The line:
- row_number() over () as d
needs to be:
- row_number() over (order by i asc) as d

I discovered this when working your code into my application. I got
very, very weird results - with one order of columns in the select, I
got the correct answer, but with another one I didn't. After much
debugging, I realized that the original version ("- row_number over
()") wasn't defined! So, depending on how I wrote the select
statement, Postgres could pick different orders!

But I see your cleaned up version already fixed this!

On 10/25/13, Elliot <yields(dot)falsehood(at)gmail(dot)com> wrote:
> Glad I could help. It's easier to understand if you break apart the CTE.
> I'm also moving around the order by i to clean this up a little. Sorry
> for the formatting.
>
> Running this:
> select i,
> val,
> row_number() over (partition by val order by i asc) as class_i,
> row_number() over (order by i asc) as overall_i,
> row_number() over (partition by val order by i asc)
> - row_number() over () as d
> from data
>
> Yields this:
> i val class_i overall_i d
> 1 A 1 1 0
> 2 A 2 2 0
> 3 A 3 3 0
> 4 B 1 4 -3
> 5 C 1 5 -4
> 6 A 4 6 -2
> 7 D 1 7 -6
> 8 A 5 8 -3
> 9 A 6 9 -3
> 10 D 2 10 -8
> 11 D 3 11 -8
> 12 B 2 12 -10
> 13 C 2 13 -11
> 14 C 3 14 -11
>
> class_i counts the row number within a class and overall_i counts the
> overall row number in the sequence. Here's just one class extracted to
> emphasize that:
>
> i val class_i overall_i d
> 1 A 1 1 0
> 2 A 2 2 0
> 3 A 3 3 0
> 6 A 4 6 -2
> 8 A 5 8 -3
> 9 A 6 9 -3
>
> Within a given consecutive run of a particular class the difference
> between class_i and overall_i will always be the same (because they're
> both increasing by the same amount) but that difference between runs
> will always be different (because each run starts the sequences at
> different offsets). "d" is the difference of the two. Because that value
> segments the runs, all that needs to be done is group by it and count
> the items in the group to get the length of the runs.
>
> The xxx column was junk left over from copying and pasting and
> verifying. Apologies :). This is a cleaned up version:
>
> with x
> as
> (
> select i,
> val,
> row_number() over (partition by val order by i asc)
> - row_number() over (order by i asc) as d
> from data
> )
> select val,
> count(*)
> from x
> group by d,
> val
> order by min(i)
> ;
>
>

In response to

Browse pgsql-general by date

  From Date Subject
Next Message Robert James 2013-10-25 16:55:08 Detecting change in event properties
Previous Message Rahila Syed 2013-10-25 14:29:34 Increasing CPU usage of PostgreSQL