Querying a Large Partitioned DB

From: Justin Funk <funkju(at)iastate(dot)edu>
To: pgsql-general(at)postgresql(dot)org
Subject: Querying a Large Partitioned DB
Date: 2009-04-10 14:15:56
Message-ID: a4c12a480904100715k21312b71if8ca26039011872@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-general

Team Amazing,

I am building a massive database for storing the syslogs of a room of
servers. The database gets about 25 million entries a day, and need to keep
them for 180 days. So the total size of the database will be about 4.5
billion records.

I need to be able to do full text searches on the message field, and of
course, it needs to be reasonably fast.

The table is partitioned daily and has this structure:
syslog=# \d
List of relations
Schema | Name | Type | Owner
--------+---------------------+-------+-------
public | systemevents | table | pgsql
public | systemevents_032909 | table | pgsql
public | systemevents_033009 | table | pgsql
public | systemevents_033109 | table | pgsql
public | systemevents_040109 | table | pgsql
public | systemevents_040209 | table | pgsql
public | systemevents_040309 | table | pgsql
public | systemevents_040409 | table | pgsql
public | systemevents_040509 | table | pgsql
public | systemevents_040609 | table | pgsql
public | systemevents_040709 | table | pgsql
public | systemevents_040909 | table | pgsql
public | systemevents_041009 | table | pgsql
(13 rows)

syslog=# \d systemevents
Table "public.systemevents"
Column | Type | Modifiers
--------------------+-----------------------------+-----------
message | character varying |
facility | integer |
fromhost | character varying(80) |
priority | integer |
devicereportedtime | timestamp without time zone |
receivedat | timestamp without time zone |
infounitid | integer |
syslogtag | character varying(80) |
message_index_col | tsvector |
Rules:
systemevents_insert_032909 AS
ON INSERT TO systemevents
WHERE new.devicereportedtime > '2009-03-28 23:59:59'::timestamp without
time zone AND new.devicereportedtime <= '2009-03-29 23:59:59'::timestamp
without time zone DO INSTEAD INSERT INTO systemevents_032909 (message,
facility, fromhost, priority, devicereportedtime, receivedat, infounitid,
syslogtag, message_index_col)
...... [there are rules like that for each partition]

My typical query looks like this:
SELECT * FROM SystemEvents WHERE message_index_col @@ to_tsquery('english',
'Term') LIMIT 25 OFFSET 0;

Here is an explain analyze:
Limit (cost=2422393.00..2422393.06 rows=25 width=153) (actual
time=93363.496..93363.610 rows=25 loops=1)
-> Sort (cost=2422393.00..2422933.05 rows=216019 width=153) (actual
time=93363.490..93363.532 rows=25 loops=1)
Sort Key: public.systemevents.devicereportedtime
Sort Method: top-N heapsort Memory: 22kB
-> Result (cost=0.00..2416297.10 rows=216019 width=153) (actual
time=20567.267..93362.574 rows=163 loops=1)
-> Append (cost=0.00..2415217.01 rows=216019 width=153)
(actual time=20567.244..93361.582 rows=163 loops=1)
-> Seq Scan on systemevents (cost=0.00..1750240.39
rows=30891 width=153) (actual time=20567.238..91580.249 rows=24 loops=1)
Filter: (message_index_col @@
'''funkju'''::tsquery)
-> Bitmap Heap Scan on systemevents_040309
systemevents (cost=1168.86..54860.45 rows=15253 width=152) (actual
time=82.429..275.589 rows=20 loops=1)
Recheck Cond: (message_index_col @@
'''funkju'''::tsquery)
-> Bitmap Index Scan on
systemevents_msg_idx_040309 (cost=0.00..1165.04 rows=15253 width=0) (actual
time=50.029..50.029 rows=20 loops=1)
Index Cond: (message_index_col @@
'''funkju'''::tsquery)
-> Bitmap Heap Scan on systemevents_040409
systemevents (cost=1038.56..52300.49 rows=14601 width=147) (actual
time=68.006..68.006 rows=0 loops=1)
Recheck Cond: (message_index_col @@
'''funkju'''::tsquery)
-> Bitmap Index Scan on
systemevents_msg_idx_040409 (cost=0.00..1034.91 rows=14601 width=0) (actual
time=67.999..67.999 rows=0 loops=1)
Index Cond: (message_index_col @@
'''funkju'''::tsquery)
-> Bitmap Heap Scan on systemevents_040509
systemevents (cost=1055.06..52482.72 rows=14644 width=150) (actual
time=63.257..63.257 rows=0 loops=1)
Recheck Cond: (message_index_col @@
'''funkju'''::tsquery)
-> Bitmap Index Scan on
systemevents_msg_idx_040509 (cost=0.00..1051.40 rows=14644 width=0) (actual
time=63.251..63.251 rows=0 loops=1)
Index Cond: (message_index_col @@
'''funkju'''::tsquery)
-> Bitmap Heap Scan on systemevents_040609
systemevents (cost=1842.50..88135.00 rows=24506 width=152) (actual
time=117.747..355.043 rows=34 loops=1)
Recheck Cond: (message_index_col @@
'''funkju'''::tsquery)
-> Bitmap Index Scan on
systemevents_msg_idx_040609 (cost=0.00..1836.37 rows=24506 width=0) (actual
time=92.079..92.079 rows=34loops=1)
Index Cond: (message_index_col @@
'''funkju'''::tsquery)
-> Bitmap Heap Scan on systemevents_040709
systemevents (cost=1844.72..89127.11 rows=24790 width=152) (actual
time=114.387..262.360 rows=24 loops=1)
Recheck Cond: (message_index_col @@
'''funkju'''::tsquery)
-> Bitmap Index Scan on
systemevents_msg_idx_040709 (cost=0.00..1838.52 rows=24790 width=0) (actual
time=84.848..84.848 rows=24loops=1)
Index Cond: (message_index_col @@
'''funkju'''::tsquery)
-> Bitmap Heap Scan on systemevents_032909
systemevents (cost=320.54..17254.18 rows=4841 width=142) (actual
time=67.808..67.810 rows=1 loops=1)
Recheck Cond: (message_index_col @@
'''funkju'''::tsquery)
-> Bitmap Index Scan on
systemevents_msg_idx_032909 (cost=0.00..319.33 rows=4841 width=0) (actual
time=56.044..56.044 rows=1 loops=1)
Index Cond: (message_index_col @@
'''funkju'''::tsquery)
-> Bitmap Heap Scan on systemevents_033009
systemevents (cost=1556.24..75179.65 rows=20931 width=149) (actual
time=77.644..335.360 rows=43 loops=1)
Recheck Cond: (message_index_col @@
'''funkju'''::tsquery)
-> Bitmap Index Scan on
systemevents_msg_idx_033009 (cost=0.00..1551.01 rows=20931 width=0) (actual
time=72.454..72.454 rows=43loops=1)
Index Cond: (message_index_col @@
'''funkju'''::tsquery)
-> Bitmap Heap Scan on systemevents_033109
systemevents (cost=1892.97..92637.60 rows=25806 width=149) (actual
time=86.468..86.856 rows=4 loops=1)
Recheck Cond: (message_index_col @@
'''funkju'''::tsquery)
-> Bitmap Index Scan on
systemevents_msg_idx_033109 (cost=0.00..1886.52 rows=25806 width=0) (actual
time=70.397..70.397 rows=4 loops=1)
Index Cond: (message_index_col @@
'''funkju'''::tsquery)
-> Bitmap Heap Scan on systemevents_040109
systemevents (cost=1395.47..66260.67 rows=18430 width=149) (actual
time=85.711..177.369 rows=12 loops=1)
Recheck Cond: (message_index_col @@
'''funkju'''::tsquery)
-> Bitmap Index Scan on
systemevents_msg_idx_040109 (cost=0.00..1390.86 rows=18430 width=0) (actual
time=67.481..67.481 rows=12 loops=1)
Index Cond: (message_index_col @@
'''funkju'''::tsquery)
-> Bitmap Heap Scan on systemevents_040209
systemevents (cost=1619.92..76738.75 rows=21326 width=150) (actual
time=89.065..89.067 rows=1 loops=1)
Recheck Cond: (message_index_col @@
'''funkju'''::tsquery)
-> Bitmap Index Scan on
systemevents_msg_idx_040209 (cost=0.00..1614.58 rows=21326 width=0) (actual
time=73.229..73.229 rows=1 loops=1)
Index Cond: (message_index_col @@
'''funkju'''::tsquery)
Total runtime: 93364.070 ms
(49 rows)

Can you give me any tips and suggestions about how to speed this up?

It seems like a smart query planner would understand the rules and know that
it should search in the last partitions first, since it is ordering by
device reported time.

Thanks!

justin

Responses

Browse pgsql-general by date

  From Date Subject
Next Message bfb21 2009-04-10 14:19:36 Re: Evidently no support for the mmddyyyy date format
Previous Message adam.slachta 2009-04-10 11:57:20 index usage in various scenarious