Re: timestamps, formatting, and internals

From: David Salisbury <salisbury(at)globe(dot)gov>
To: pgsql-general(at)postgresql(dot)org
Subject: Re: timestamps, formatting, and internals
Date: 2012-05-18 23:30:23
Message-ID: 4FB6DB8F.4060307@globe.gov
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-general


Actually, figured I'd post the whole function, painful as it
might be for anyone to read. If anyone sees something that's a bit
of a risk ( like perhaps the whole thing ;)

On 5/18/12 5:19 PM, David Salisbury wrote:
>
> I'm trying to debug an intermittent problem I'm seeing in one of our
> rollup scripts.
>
> I'll try to summarize. A table has a measured_at field, of which I
> calculate another
> time value based on that field and a longitude value, called solar_noon,
> and I summarize
> min/max values grouped around this solarnoon. While I'm doing this I
> also calculate a
> minimum time difference between the calcualted solar noon value and all
> the measured_at times
> within the group. I then join this summary table back with the original
> table it's
> summarizing, trying to pick out the one record in it that has the
> measured_at value that's closest
> to the solarnoon value of the grouping.
>
> Clear as mud? Anyways, there seems to be a problem on that last part.
> I'm thinking
> the join on these date values is a bit funky. Perhaps things aren't
> matching up on micro
> second values, but it's hard to know with queries if I'm seeing what the
> DB is seeing, as
> date values are stored in seconds and what queries give you is a format
> of that.
>
> So one question I have is if there a way to set PG in the way Oracle
> does it..
> set nls_date_format = 'YYYY...' so I can query and see exactly what PG
> is seeing,
> even to the microseconds? Is there a config parameter I can set in PG so
> that calculations
> are done only to the second? It seems this join doesn't always find a
> record that's closest
> to solar noon, and therefore drops the summary and join record all
> together.
>
> PG 9.0, Linux
>
>

execute "
CREATE OR REPLACE FUNCTION rollup_and_insert_subdays()
returns void as $$
declare v_created_at timestamp without time zone;
BEGIN
v_created_at := now();
INSERT INTO air_temp_dailies (
uuid,
site_id,
organizationid,
protocol_id,
measured_at,
current_temp_c,
maximum_temp_c,
minimum_temp_c,
created_at,
comments
)
SELECT
subd_summary.uuid,
subd_summary.site_id,
subd_summary.organizationid,
subd_summary.protocol_id,
subd_summary.measured_at,
subd_summary.current_temp_c,
subd_summary.max_temp_c,
subd_summary.min_temp_c,
v_created_at as updated_at,
'automated station rollup' as comments

FROM (
SELECT DISTINCT on ( site_id, solarnoon )
site_id,
organizationid,
protocol_id,
uuid,
solarnoon,
measured_at,
current_temp_c,
max_temp_c,
min_temp_c
FROM (
SELECT
sd.site_id,
sd.organizationid,
sd.protocol_id,
sd.uuid,
sds.solarnoon,
sd.measured_at,
sd.current_temp_c,

sds.max_temp_c as max_temp_c,
sds.min_temp_c as min_temp_c
FROM (
SELECT
site_id,
longitude,
calculate_local_solar_noon(measured_at,longitude) as solarnoon,
max(current_temp_c) as max_temp_c,
min(current_temp_c) as min_temp_c,
min(
abs(
cast(
extract(
epoch FROM (
measured_at -
calculate_local_solar_noon(measured_at,longitude)
)
) as integer
)
)
) as
minimum_time_between_measured_and_solarnoon,
trunc(count(*)/93) as enough_measurements
FROM (
SELECT
site_id,
current_temp_c,
measured_at,
ST_X(point) as longitude
FROM
air_temp_sub_days INNER JOIN sites on (
air_temp_sub_days.site_id = sites.id
)
) as appending_longitude_to_sub_day_values
WHERE
measured_at is not null
GROUP BY
site_id,
calculate_local_solar_noon(measured_at,longitude),
-- we assume that for any site, for any given solarnoon, the
-- longitude is constant, so adding this value in the
-- group by has no effect other than allowing these values
-- to percolate to the outer select
longitude
) sds,
air_temp_sub_days sd
WHERE
sds.site_id = sd.site_id
and
calculate_local_solar_noon(
sd.measured_at,
sds.longitude
) = sds.solarnoon
-- match with the record that's closest to solarnoon.
-- At this point we know the time difference,
-- but not whether it's more or less. The higher level
-- DISTINCT clause removes any duplicates caused should
-- solarnoon fall exactly between two measured_at times.
and
enough_measurements > 0
and
(
(
sd.measured_at = (
calculate_local_solar_noon(sd.measured_at,sds.longitude) + (
sds.minimum_time_between_measured_and_solarnoon::text ||
' secs'
)::interval
)
)
or
(
sd.measured_at = (
calculate_local_solar_noon(sd.measured_at,sds.longitude) - (
sds.minimum_time_between_measured_and_solarnoon::text ||
' secs'
)::interval
)
)
)
) end_distinct
) subd_summary
LEFT OUTER JOIN air_temp_dailies on
subd_summary.site_id = air_temp_dailies.site_id
and
subd_summary.measured_at = air_temp_dailies.measured_at
WHERE
air_temp_dailies.site_id is null
and
air_temp_dailies.measured_at is null
order by subd_summary.site_id, subd_summary.measured_at
;

END $$ LANGUAGE plpgsql;

In response to

Responses

Browse pgsql-general by date

  From Date Subject
Next Message David Salisbury 2012-05-18 23:33:00 Re: timestamps, formatting, and internals
Previous Message David Salisbury 2012-05-18 23:19:57 timestamps, formatting, and internals