Re: Newbie questions relating to transactions

From: Carl Sopchak <carl(dot)sopchak(at)cegis123(dot)com>
To: Alvaro Herrera <alvherre(at)commandprompt(dot)com>
Cc: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, pgsql-general(at)postgresql(dot)org
Subject: Re: Newbie questions relating to transactions
Date: 2009-03-08 17:17:12
Message-ID: 200903081317.13158.carl.sopchak@cegis123.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-general

On Sunday, March 08, 2009, Alvaro Herrera wrote:
> Carl Sopchak wrote:
> > Well, the upgrade to 8.3 seemed to rid me of the command limit, but now
> > I'm running out of memory. I have 2Gb physical and 8Gb swap (after
> > adding 4Gb).
>
> Do you have AFTER triggers on the involved tables? They are recorded on
> memory and we have no mechanism to spill to disk, so it's frequent that
> those cause out-of-memory. If that's the explanation, your workaround
> would be to get rid of them.
No triggers have been defined on any tables in the database...

>
> > Is there a way for me to run this outside of one huge transaction? This
> > really shouldn't be using more than a few hundred megs of RAM (assuming
> > cursor records are all stored in memory)...
>
> Hmm, maybe you're holding too many cursors open and not closing them
> timely? Did you post your function for review?
I am only using one cursor, which is opened and closed repeatedly. It pulls
the base data from the database for the calculations.

I have not posted the function for review yet because its function is
proprietary. However, I have stripped out the proprietary stuff, and include
the code below. I marked everything stripped out by placing a brief
description enclosed within {{ and }}. I left all of the places that the
database is accessed in the code. I changed some of the line wrapping to fit
a reasonable width (which I mention in case you see syntax type errors).

There is a few lines that save the calculation details based on a flag in the
trial_header table. This flag is set to N for the run that I am having
issues with, so these records are not being created. I left that code in
below for completeness...

I realize this isn't probably the cleanest code out there (I'm sure using
prepared statements would help speed), but it was really meant to be a "quick
and dirty" way to calculate the data I need. Any comments or suggestions on
improving the code is welcome.

create or replace function Run_Trial (tid integer) returns void as $proc$

declare
{{ declarations }}
begin
-- Set start time...
program_version := '1.16';
update trial_header set start_timestamp = clock_timestamp(),
run_version = program_version, end_timestamp = null
where trial_id = tid;

-- get rid of prior run, if any:
delete from trial_results where trial_id = tid;
delete from trial_calc_detail where trial_id = tid;

-- Get the trial parameters:
select * into trial_hdr from trial_header where trial_id = tid;

{{ Do some calculations }}

-- Create temp table of data. This simplifies the coding below A LOT.
{{ conditional calc }}

-- (I can't figure out how to do this with a dynamic select and
-- "insert into trial_data select" and I get an error
-- if I make the dynamic SQL a "select into temp table trial_data"...)
-- Do it by brute force, I guess...
drop table if exists trial_data;
create temp table trial_data(
{{ fields }}
);
for row in execute
'select {{ select statement }}' loop
execute 'insert into trial_data values(' || {{ fields }} || ')';
end loop;

create index trial_data_idx on trial_data (data_yyyymm, data_date);
create index trial_data_idx2 on trial_data (data_date);

-- Get date range for the data set we're using
for row in execute 'select min(data_date) as min_date,
max(data_date) as max_date from trial_data' loop
low_data_date := row.min_date;
high_data_date := row.max_date;
end loop;

-- Calculate maximum number of years that data covers
max_years = floor((high_data_date - low_data_date) / 365.25);

-- Loop through all possible "x year" periods
for cur_years in 1 .. max_years loop

-- start from the first period on file:
next_iteration_start := low_data_date;

num_periods := trial_hdr.periods_per_year * cur_years
+ trial_hdr.{{ field }};

for row in execute 'select count(*) as cnt from (
select data_date from trial_data where data_date >= ' ||
quote_literal(next_iteration_start) ||
' Limit ' || to_char(num_periods, '9999999999') || ') a' loop
data_periods := row.cnt;
end loop;

-- Do each "x year" period in data
while data_periods = num_periods loop

-- Initialize calculation

-- used to set sucessive values for next_iteration_start:
iteration_counter := 0;

{{ some calculations }}

for row in execute 'select max(data_date) as max_date' ||
' from ( select data_date from trial_data' ||
' where data_date >= ' ||
quote_literal(next_iteration_start) ||
' order by data_date' ||
' Limit ' || to_char(num_periods, '999999999') || ') a' loop
per_end_date := row.max_date;
end loop;

-- Get data for calculation
open data_cursor for execute 'select * from trial_data' ||
' where data_date >= ' || quote_literal(next_iteration_start) ||
' order by data_date ' ||
' Limit ' || to_char(num_periods, '999999999');

loop -- through periods for calculation
fetch data_cursor into data;
if not found then
exit;
end if;

-- determine next iteration start date:
iteration_counter := iteration_counter + 1;
if iteration_counter = 1 then
{{ calculations }}
end if;
if iteration_counter = 2 then
next_iteration_start := data.data_date;
end if;

{{ calculations based on row data }}

-- save details if requested:
if upper(trial_hdr.save_calc_details) = 'Y' then
insert into trial_calc_detail values( {{ fields }} );
end if;

{{ Calculation }}
end loop; -- through periods for calculation

-- Final calculations:
{{ calculations }}

-- save results:
insert into trial_results values( {{ fields }} );

close data_cursor;

for row in execute 'select count(*) as cnt from (
select data_date from trial_data where data_date >= ' ||
quote_literal(next_iteration_start) ||
' Limit ' || to_char(num_periods, '9999999999') || ') a'
loop
data_periods := row.cnt;
end loop;

end loop; -- Do each "x year" period in data

end loop; -- loop through all possible "x year" periods

-- mark finish time on trial:
update trial_header set end_timestamp = clock_timestamp()
where trial_id = tid;

-- clean up:
drop table trial_data;
end;

$proc$ language plpgsql;

In response to

Browse pgsql-general by date

  From Date Subject
Next Message Christian Schröder 2009-03-08 18:37:32 Re: Performance of subselects
Previous Message Thomas Kellerer 2009-03-08 17:13:12 Re: Newbie questions relating to transactions