Querying a time range across multiple partitions

From: Cal Heldenbrand <cal(at)fbsdata(dot)com>
To: pgsql-general(at)postgresql(dot)org
Subject: Querying a time range across multiple partitions
Date: 2014-09-05 17:31:30
Message-ID: CAAcwKheH7rCjdCbvyj-zEN=GZvrj2zGV9tywC-Q72BCA=T+CNA@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-general

Hi everyone,

I'm trying to run a select query from a span of child partitions, separated
out in daily tables, in Postgres 9.1.5. The parent looks like this:

# \d logins
Table "public.logins"
Column | Type | Modifiers
-------------+-----------------------------+-----------
username | character varying(50) |
event | character varying(20) |
time | timestamp without time zone |
host | character varying(18) |
hash | character varying(32) |
Triggers:
logins_log_trigger BEFORE INSERT ON logins FOR EACH ROW EXECUTE
PROCEDURE logins_insert_trigger()
Number of child tables: 1581 (Use \d+ to list them.)

And an example child table:

# \d logins_20140904
Table "public.logins_20140904"
Column | Type | Modifiers
-------------+-----------------------------+-----------
username | character varying(50) |
event | character varying(20) |
time | timestamp without time zone |
host | character varying(18) |
hash | character varying(32) |
Indexes:
"logins_20140904_event" hash (event)
"logins_20140904_event_time" btree (event, "time")
"logins_20140904_username" hash (username)
"logins_20140904_username_time" btree (username, "time")
Check constraints:
"logins_20140904_time_check" CHECK ("time" >= '2014-09-04
00:00:00'::timestamp without time zone AND "time" <= '2014-09-04
23:59:59.99'::timestamp without time zone)
Inherits: logins

I'm attempting to run a query that looks something like this:

explain analyze select time,event from logins
where username='bob' and hash='1234' and time > current_date - interval
'1 week';

Result (cost=0.00..765.11 rows=1582 width=14)
-> Append (cost=0.00..765.11 rows=1582 width=14)
-> Seq Scan on logins (cost=0.00..0.00 rows=1 width=66)
Filter: (((username)::text = 'bob'::text) AND ((hash)::text =
'1234'::text) AND ("time" > (('now'::text)::date - '7 days'::interval)))
-> Index Scan using logins_20100501_username_time on
logins_20100501 logins (cost=0.01..0.48 rows=1 width=14)
...

This shows that it's attempting to run the query against all of my 1500
child tables.

If I change it to manually specify dates with an IN clause, it selects the
appropriate tables:

explain analyze select time,event from logins
where username='bob' and hash='1234' and time in ('2014-09-04',
'2014-09-05', '2014-09-03');

Result (cost=0.00..2.41 rows=3 width=31) (actual time=0.060..0.060 rows=0
loops=1)
-> Append (cost=0.00..2.41 rows=3 width=31) (actual time=0.060..0.060
rows=0 loops=1)
-> Seq Scan on logins (cost=0.00..0.00 rows=1 width=66) (actual
time=0.001..0.001 rows=0 loops=1)
Filter: (((username)::text = 'bob'::text) AND ((hash)::text
= '1234'::text) AND ("time" = ANY ('{"2014-09-04 00:00:00","2014-09-05
00:00:00","2014-09-03 00:00:00"}'::timestamp without time zone[])))
-> Bitmap Heap Scan on logins_20140903 logins (cost=1.09..1.20
rows=1 width=14) (actual time=0.039..0.039 rows=0 loops=1)
...

I know I could construct a query manually in my client by walking back
through the calendar dates, but for my own curiosity I'd like to find out a
more elegant way to run this query.

So far, I've been messing around with generating a series, then collecting
it back into an array, but nothing I've tried seems to work.

A few examples:

explain analyze select time,event from logins
where username='bob' and hash=1234' and time in (
generate_series(current_date - interval '3 days', current_date,
interval '1 day')
);
ERROR: argument of IN must not return a set

explain analyze select time,event from logins
where username='bob' and hash='1234' and time in (
select array_agg(series)
from generate_series(current_date - interval '3 days', current_date,
interval '1 day')
as u(series)
);
ERROR: operator does not exist: timestamp without time zone = timestamp
without time zone[]

explain analyze select time,event from logins
where username='bob' and hash=1234 and time in (
select unnest(array_agg(date_trunc('day',series))) from
generate_series(current_date - interval '3 days', current_date,
interval '1 day') as u(series)
)
-- No errors, but still queries all child tables.

Thank you!

--Cal

Responses

Browse pgsql-general by date

  From Date Subject
Next Message Kevin Grittner 2014-09-05 18:15:28 Re: count on cascading deletes
Previous Message Eildert Groeneveld 2014-09-05 16:49:15 count on cascading deletes