From: | Adam Cornett <adam(dot)cornett(at)gmail(dot)com> |
---|---|
To: | Jeff Adams <Jeff(dot)Adams(at)noaa(dot)gov> |
Cc: | pgsql-general(at)postgresql(dot)org |
Subject: | Re: Date Range Using Months and Days Only |
Date: | 2011-10-19 22:50:30 |
Message-ID: | CAB5sPxbpx3qxfzKbCZ5MLKgf3S7w4Bns5cQ3xvdkEeXG7wvKFA@mail.gmail.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-general |
On Wed, Oct 19, 2011 at 12:40 PM, Jeff Adams <Jeff(dot)Adams(at)noaa(dot)gov> wrote:
> Greetings,
>
> I have to write a query on a fairly large table of data (>100 million rows)
> where I need to check to see if a date (epoch) falls between a range of
> values. The catch is that the range is defined only by month and day
> values.
> For example the record containing the epoch value will be linked to a table
> that containing columns named start_month, start_day, end_month, end_day
> that define the range. With respect to the range, year does not matter,
> however, some of the ranges will start in November and end in April of the
> next year. Has anyone come across this type of query? I could certainly
> write a function or even include criteria in a query that would extract
> date
> parts of the epoch and then compare against the values in the start_month,
> start_day, end_month, end_day (it might get complex with respect to ranges
> where the start year and end year are different), but I am worried about
> performance. I thought I would seek some input before I floundered through
> the many iterations of poor performing alternatives! Any thoughts would be
> greatly appreciated.
>
> Thanks in advance...
> Jeff
>
>
>
>
>
> --
> Sent via pgsql-general mailing list (pgsql-general(at)postgresql(dot)org)
> To make changes to your subscription:
> http://www.postgresql.org/mailpref/pgsql-general
>
create table a (
id_a integer,
epoch integer
);
create table b (
id_b integer,
start_month integer,
start_day integer,
end_month integer,
end_day integer
);
CREATE OR REPLACE FUNCTION get_timestamps(sm integer, sd integer, em
integer, ed integer, year integer)
RETURNS timestamp without time zone[] AS
$BODY$
declare
syear integer := year;
eyear integer := year;
tstamps timestamp[];
begin
if(sm>em) then
-- assume that since the end month is less than the start month is in
the next year
eyear := eyear+1;
end if;
tstamps[0] := (syear|| '-' || sm || '-' || sd)::timestamp;
tstamps[1] := (eyear|| '-' || em || '-' || ed)::timestamp;
return tstamps;
end
$BODY$
LANGUAGE plpgsql STABLE;
create view a_timestamp as
SELECT id_a, TIMESTAMP WITH TIME ZONE 'epoch' + epoch * INTERVAL '1 second'
as tstamp from a;
with ab as(
select
id_a,id_b,get_timestamps(b.start_month,b.start_day,b.end_month,b.end_day,
extract(year from a.tstamp)::integer) ts_arr,a.tstamp
from a_timestamp as a,b
)
select * from ab
where ab.tstamp between ts_arr[0] and ts_arr[1]
This obviously isn't a fast solution to your problem, although converting
the integer epoch to a timestamp in table a would eliminate the view
a_timestamp and you can index the column for some speed up, the real problem
you're facing is that your ranges don't have years, otherwise you could
store everything as a timestamp and then just join using 'between' and
postgres would just need to do an index scan on each table.
-Adam
From | Date | Subject | |
---|---|---|---|
Next Message | Jake Stride | 2011-10-19 23:17:39 | Access to inserted rows via select in a statement |
Previous Message | Raymond O'Donnell | 2011-10-19 21:15:49 | Re: Logging ALL output from psql |