From: | Mike Mascari <mascarm(at)mascari(dot)com> |
---|---|
To: | postgresql <pgsql-general(at)postgresql(dot)org> |
Subject: | pl/R questions |
Date: | 2003-08-02 06:55:42 |
Message-ID: | 3F2B606E.6060800@mascari.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-general |
Hello.
I've downloaded and installed pl/R version 0.4.4. My goal is to be
able to perform multivariate linear regression analysis. However, for
the sake of simplicity, I'm performing a single variable regression
and prediction as a test:
1) CREATE TABLE entries(float8 x, float8 y);
I have the following values:
x | y
----+--------
1 | 133890
2 | 135000
3 | 135790
4 | 137300
5 | 138130
6 | 139100
7 | 139900
8 | 141120
9 | 141890
10 | 143230
11 | 144000
12 | 145290
2) CREATE TABLE predictions(float8 x);
I have the following new x values:
x
----
13
14
15
16
17
3) I have the following functions:
-- Reset R global variables
CREATE OR REPLACE FUNCTION r_resetlm() RETURNS integer AS '
xs <<- c()
ys <<- c()
nxs <<- c()
return(1)
' LANGUAGE 'plr' WITH (isStrict);
-- Add a new known x, f(x)
CREATE OR REPLACE FUNCTION r_initknowns(float8, float8) RETURNS
integer AS '
xs <<- c(xs, arg1)
ys <<- c(ys, arg2)
return(1)
' LANGUAGE 'plr' WITH (isStrict);
-- Add a predicting x
CREATE OR REPLACE FUNCTION r_initpredicts(float8) RETURNS integer AS '
nxs <<- c(nxs, arg1)
return(1)
' LANGUAGE 'plr' WITH (isStrict);
-- Generate the predictions
CREATE OR REPLACE FUNCTION r_predict()
RETURNS SETOF RECORD AS '
samples <- data.frame(xs=nxs)
result <- predict(lm(ys ~ xs), samples)
return (result)
' LANGUAGE 'plr' WITH (isStrict);
4) I perform the prediction like so:
select r_resetlm();
select r_initknowns(x, y) from entries;
select r_initpredicts(x) from predictions;
select * from r_predict() as trend(ny float8);
This works, but there are several potential problems using this method:
(A) The function r_resetlm() must be called to reset the global values
before each invocation. Not a big problem, but I would like to avoid
globals, if possible. The relations supplying the data are temporary
tables and thus I cannot refer to their names in static pl/R. I can't
figure out a way to use pg.spi.prepare()/pg.spi.execp() to initialize
R variables with the result of the executed queries. I would like to
do something like this, instead:
CREATE OR REPLACE FUNCTION r_predict(text, text)
RETURNS SETOF RECORD AS '
sql <- paste("SELECT x, y FROM", arg1, "ORDER BY x")
plan <- pg.spi.prepare(sql, NA)
pg.spi.execp(plan, NA)
??? Read results into appropriate vectors
samples <- data.frame(xs=nxs)
result <- predict(lm(ys ~ xs), samples)
return (result)
' LANGUAGE 'plr' WITH (isStrict);
(B) I suppose an unqualified SELECT will always invoke r_initknowns()
and r_initpredicts() but is this guaranteed? And guaranteed to only be
executed once for each tuple? If so, then I'm somewhat less bothered
by the use of R globals. Is using the VOLATILE attribute in the CREATE
FUNTION statement sufficient to guarantee that the call will always be
made?
(C) For the life of me, and this is an R question, I cannot figure out
how to get R to perform predictions on multivariate data:
ys <- c(133890, 135000, 135790, 137300, 138130, 139100, 139900,
141120, 141890, 143230, 144000, 145290)
xs1 <- c(1:12)
xs2 <- c(22, 24.5, 27, 33, 36.8, 40, 44, 57, 59, 62, 74, 77)
xm <- cbind(xs1, xs2)
nx1 <- c(13:17)
nx2 <- c(82, 85, 88.3, 90, 95)
samples <- data.frame(xs1=nx1, xs2=nx2)
f <- predict(lm(ys ~ xm), samples)
data.frame(f) yields:
f
1 133949.8
2 134970.2
3 135990.6
4 137008.1
5 138027.5
6 139047.3
7 140066.5
8 141078.3
9 142099.1
10 143119.1
11 144131.7
12 145151.7
Not the predicted y's for the new x1's and x2's. I tried:
f <- predict.mlm(lm(ys ~ xm), samples) and got:
Error in object$coefficients[piv, ] : incorrect number of dimensions
And I have no mlm() to use.
Any clues? I'm a bit of an R newbie. I wrote my own multivariate
linear regression functions years ago in C++, but I'd prefer to
leverage R from PostgreSQL.
Mike Mascari
mascarm(at)mascari(dot)com
PS: Thanks to Joe Conway for this procedural language.
From | Date | Subject | |
---|---|---|---|
Next Message | Lincoln Yeoh | 2003-08-02 09:32:57 | Re: Apache - DBI - Postgresql: Cancelling queries |
Previous Message | Bruce Momjian | 2003-08-02 06:11:40 | Re: plPHP -- sort of an announcement.. but not commercial |