Quick Links

Postgres on SSD

From:	Ondrej Ivanič <ondrej(dot)ivanic(at)gmail(dot)com>
To:	pgsql-general(at)postgresql(dot)org
Subject:	Postgres on SSD
Date:	2011-08-09 23:17:18
Message-ID:	CAM6mieJ1QQnmr7cnzkjmMHkyCZ1d1Ka2wGfCEfJ2oQCzMHFLCQ@mail.gmail.com
Views:	Whole Thread \| Raw Message \| Download mbox \| Resend email
Thread:
Lists:	pgsql-general

Hi,

I'm about to evaluate this SSD card: FusionIO ioDrive Duo [1]. The
main reason for this experiment is to see if SSD can significantly
improve query performance. So, I have the following questions:

- Could you please share your experience with SSD? Any issues?
- What needs to be changed at Postgres/Operating system level? The
obvious one is to change random_page_cost (now: 2) and seq_page_cost
(now: 4). What else should I look at?

Background:
Database schema is pretty simple:
Database size is around ~1.4TB. Main tables occupied around 1/3
(450GB, ~220mil rows) and aggregated data occupied 2/3 (900GB). All
indexes are on separate table space (~550GB)
1) main table(s)
This big fact table has ~200 columns and average row size is 1.5kb.
This table is never updated and new data is inserted constantly using
copy in chunks about 10k rows. Table is quite sparse so it is broken
into 5 tables which are joined as necessary (regular query reads 10 -
40% of the row size).

Tables are partitioned by month but I'm thinking to use week or two
week partitions.

Primary key is composite key (datetime, organisation, transaction_id).
The transaction_id column is unique but "datetime" column is used for
partitioning and all queries contains organisation="...."
(multi-tetant database). In addition, there are ~15 single column
indexes. Old data is deleted after 6 months (drop partition)

The usual query looks like this:
- select ... from T where organisation = ... and datetime between ...
and ... where ... order by <single col> limit 1000
User can choose any column for sorting but we created indexes for the
most popular/reasonable ones (those ~15 single column indexes).

In the reality, query is more complex because of few Postgres issues:
- partitions/limit/order issue described on Stackoverflow and fixed in
9.1 or 9.2 [2], [3].
- partitions/join issues ie left join "on" clause must contain
datetime condition in order to avoid fulltable scan on joined table

Query response time for indexed columns is between 5 to 30 sec
(sometimes 200sec). The target is to have all queries under 5 sec. If
query has order by on non-indexed column then response time is in
hundreds seconds but desired response time should be 10sec (test query
is over 1 month range and organisation has between 0.5 and 2 mil row
per month; single partition has > 30 mil rows)

2) materialised aggregate tables
About 45 tables like this: agg_attribute1_attribute2(date,
organisation, attribute1, attribute2, count) (= select datetime::date,
organisation, attribute1, attribute2, count(*) from T where
organisation = ... and datetime between ... and ... group by 1,2,3,4)
Tables are updated by cron job every 15 minutes. Thanks for 8.3 HOT
updates - almost no bloat! Monthly partitions are used (date column).
Query response time is between 5 to 30 sec (sometimes 200sec) and the
target is to have all queries under 5 sec

Usual query is:
select attribute1, count(*) from agg_attribute1_... where organisation
= ... and datetime between ... and ... group by 1 limit 10
or
select attribute1, attribute2 count(*) from agg_attribute1_attribute2
where organisation = ... and datetime between ... and ... group by 1
limit 10

Top N queries perform even worse -- the query response time is in
minutes and the target is around 15 sec

Current hardware setup:
XEN host with 16 CPU (Intel(R) Xeon(R) CPU L5520 @ 2.27GHz). CentOS 5.6
80GB RAM
Storage: some Hitachi Fibre channel SAN with two LUNs:
1st LUN has *everything* under $PG_DATA (used 850 GB)
2nd LUN has *all* indexes (index table space) (used 550GB)

Postgres settings:
name |
current_setting

[1] http://www.fusionio.com/products/iodrive-duo/
[2] http://stackoverflow.com/questions/6268633/postgres-partitioning-order-by-performance
[3] http://stackoverflow.com/questions/2236776/efficient-querying-of-multi-partition-postgres-table

Thanks,
--
Ondrej Ivanic
(ondrej(dot)ivanic(at)gmail(dot)com)

Responses

Re: Postgres on SSD at 2011-08-10 07:18:26 from Tomas Vondra
Re: Postgres on SSD at 2011-08-12 04:57:15 from Greg Smith

Browse pgsql-general by date

	From	Date	Subject
Next Message	Diego Augusto Molina	2011-08-09 23:47:50	Re: Indicating DEFAULT values in INSERT statement
Previous Message	Postgres User	2011-08-09 22:57:46	Indicating DEFAULT values in INSERT statement