From: | Joe Conway <mail(at)joeconway(dot)com> |
---|---|
To: | cbbrowne(at)cbbrowne(dot)com |
Cc: | "Hackers (PostgreSQL)" <pgsql-hackers(at)postgresql(dot)org> |
Subject: | Re: new procedural language - PL/R |
Date: | 2003-02-03 21:39:06 |
Message-ID: | 3E3EE17A.4030107@joeconway.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
cbbrowne(at)cbbrowne(dot)com wrote:
> What might be "nifty" would be to have some mappings that did Clever
> Transformations of Queries Into Views, particularly if that allowed
> harnessing the DBMS to do some of the statistical analysis behind your
> back...
I'm not quite sure what you mean here, but it does support pulling data into
the R interpreter as a "data.frame" via SPI, and returning R
matricies/vectors/data.frames as either Postgres arrays or as rows and columns
of a table function. Here's two contrived, but illustrative, examples:
create or replace function test_dtup() returns record as
'data.frame(letters[1:10],1:10)' language 'plr';
select * from test_dtup() as t(f1 text, f2 int);
f1 | f2
----+----
a | 1
b | 2
c | 3
d | 4
e | 5
f | 6
g | 7
h | 8
i | 9
j | 10
(10 rows)
create or replace function test_spi_tup(text) returns record as
'pg.spi.exec(arg1)' language 'plr';
select * from test_spi_tup('select oid, typname from pg_type where typname =
''oid'' or typname = ''text''') as t(typeid oid, typename name);
typeid | typename
--------+----------
25 | text
26 | oid
(2 rows)
You could easily perform a parameterized query via SPI, retrieve the results
into an R data.frame, do some statistical manipulations, and then return the
results as a table function. The table function itself could be wrapped in a
view to hide the whole thing from the end-user.
You can also create custom aggregates. There has been at least one thread not
too long ago regarding an aggregate to calculate median, for instance. Here it
is in plr:
create table foo(f1 text, f2 float8);
insert into foo values('cat1',1.21);
insert into foo values('cat1',1.24);
insert into foo values('cat1',1.18);
insert into foo values('cat1',1.26);
insert into foo values('cat1',1.15);
insert into foo values('cat2',1.15);
insert into foo values('cat2',1.26);
insert into foo values('cat2',1.32);
insert into foo values('cat2',1.30);
create or replace function r_median(_float8) returns float as 'median(arg1)'
language 'plr';
CREATE AGGREGATE median (sfunc = array_accum, basetype = float8, stype =
_float8, finalfunc = r_median);
select f1, median(f2) from foo group by f1 order by f1;
f1 | median
------+--------
cat1 | 1.21
cat2 | 1.28
(2 rows)
It's not as fast as the native PostgreSQL functions if you just need average
or standard deviation, but it's alot easier and faster than writing your own
for something more out-of-the-ordinary.
Joe
From | Date | Subject | |
---|---|---|---|
Next Message | Tom Lane | 2003-02-03 21:52:14 | Re: Win32 Powerfail testing - results |
Previous Message | Jan Wieck | 2003-02-03 21:20:09 | Re: [mail] Re: Windows Build System |