| From: | Mike Toews <mwtoews(at)sfu(dot)ca> |
|---|---|
| To: | Scott Marlowe <scott(dot)marlowe(at)gmail(dot)com> |
| Cc: | pgsql-general(at)postgresql(dot)org |
| Subject: | Re: Select ranges based on sequential breaks |
| Date: | 2009-06-22 22:05:14 |
| Message-ID: | 4A40001A.2020109@sfu.ca |
| Views: | Whole Thread | Raw Message | Download mbox | Resend email |
| Thread: | |
| Lists: | pgsql-general |
Scott Marlowe wrote:
> On Mon, Jun 15, 2009 at 12:23 PM, Mike Toews<mwtoews(at)sfu(dot)ca> wrote:
>
>> This is easy to compute using a spreadsheet or in R, but how would I do this
>> with SQL? I'm using 8.3. Advice is appreciated.
>>
>
> FYI (and I'm no expert in this area) R is available as a pl for
> postgres, look for pl/R or plR
>
FYI, here is how I implement ranges on sequential breaks in R. Sorry, I
haven't meddled with plR yet, although I'm experience with both R and
postgres. This is all R code:
# Randomly sampled bins: "red", "blue"
dat <- data.frame(date=seq(as.Date("2009-01-01"), by="days", length.out=20))
dat$bin <- factor(sample(c("red","blue"), 10, replace=TRUE,
prob=c(0.4,0.6)))
# Determine where the rows are different; 1=different rows, 0=same rows
dat$breaks <- ifelse(dat$bin != c(TRUE,
as.character(dat$bin[-nrow(dat)])), 1, 0)
# Determine where the continuous parts are:
dat$part <- factor(cumsum(dat$breaks))
# Results vary due to random sampling
print(dat)
... and on the SQL side, simple aggregates like min(), max(etc) can be
used with "GROUP BY part" to determine the start/end dates, length of
duration, etc.
-Mike
| From | Date | Subject | |
|---|---|---|---|
| Next Message | John DeSoi | 2009-06-22 22:14:11 | Re: Information about columns |
| Previous Message | Jack Orenstein | 2009-06-22 21:55:28 | Slight change in query leads to unexpected change in query plan |