From: | David Orme <d(dot)orme(at)imperial(dot)ac(dot)uk> |
---|---|
To: | pgsql-novice(at)postgresql(dot)org |
Subject: | multiple sampling from tables and saving output |
Date: | 2005-02-07 11:57:26 |
Message-ID: | 4b417e094092311ce074dffc62e865ad@ic.ac.uk |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-novice |
Hi,
I need to perform a statistical bootstrap on data held in a postgresql
database and I was wondering if anyone could recommend strategies.
The process I need to do is a loop of 1000 repetitions of the
following:
1) select a random subset of the data from a table
2) save various summaries of the randomly selected data
I can think of various external ways of doing this - my current plan is
to use a shell script to resend the same set of instructions repeated
times using 'psql -f instruction_set.sql' - but I was wondering if
there was a canonical way of doing this within pgsql. I've had a bit of
a look at procedural languages but I'm not sure which is best for
handling this kind of process.
I've included an example of the set of instructions I want to repeat.
Cheers,
David Orme
[running psql 7.3.4 on RHEL 3]
-- Select 1096 species subsets
select grid_id, species_id
into temp random_locs
from possible_locations
where species_id in (select rand_pick.species_id from
(select species_id, random() as random_id
from species
order by random_id
limit 1096) as rand_pick);
-- set up tab delimited unaligned
\a
\f '\t'
-- export summary table by grid_id
\o curr_gridid.txt
-- get a count by grid id including nulls
select grid_id, cnt from
behr_grid left join (
select grid_id, count(distinct(species_id)) as cnt
from random_locs
group by grid_id) as loc_count
using (grid_id)
order by grid_id;
From | Date | Subject | |
---|---|---|---|
Next Message | Akbar | 2005-02-07 12:23:20 | simple query question |
Previous Message | Schuhmacher, Bret | 2005-02-07 11:34:03 | Stuck in "group by" aggregate hell |