From: | David Yeu <david(dot)yeu(at)skype(dot)net> |
---|---|
To: | "pgsql-performance(at)postgresql(dot)org" <pgsql-performance(at)postgresql(dot)org> |
Cc: | "ops(at)groupme(dot)com" <ops(at)groupme(dot)com> |
Subject: | Performance on large, append-only tables |
Date: | 2012-02-08 18:03:04 |
Message-ID: | CB582308.955%david.yeu@skype.net |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-performance |
Hi there,
We've got a pretty large table that sees millions of new rows a day, and
we're trying our best to optimize queries against it. We're hoping to find
some guidance on this list.
Thankfully, the types of queries that we perform against this table are
pretty constrained. We never update rows and we never join against other
tables. The table essentially looks like this:
| id | group_id | created_at | everything elseŠ
Where `id' is the primary key, auto-incrementing, `group_id' is the
foreign key that we always scope against, and `created_at' is the
insertion time. We have indices against the primary key and the group_id.
Our queries essentially fall into the following cases:
* Š WHERE group_id = ? ORDER BY created_at DESC LIMIT 20;
* Š WHERE group_id = ? AND id > ? ORDER BY created_at DESC;
* Š WHERE group_id = ? AND id < ? ORDER BY created_at DESC LIMIT 20;
* Š WHERE group_id = ? ORDER BY created_at DESC LIMIT 20 OFFSET ?;
In human words, we're looking for:
* The most recent (20) rows.
* The most recent rows after a given `id'.
* Twenty rows before a given `id'.
* Pages of twenty rows.
Originally, this table was part of our primary database, but recently we
saw queries take upwards of thirty seconds or more to complete. Since
we're serving web requests, that's basically unacceptable, and caused a
lot of requests to backup. Our interim solution has been to simply carve
out a new database that hosts only this table, and that has worked to some
degree. We aren't seeing thirty seconds plus database response times
anymore, but some queries still take many seconds and the cost of spinning
up a new master-slave configuration hasn't been cheap.
In the meantime, we're hoping to investigate other ways to optimize this
table and the queries against it. Heroku's data team has suggested balling
up these rows into arrays, where a single row would represent a group_id,
and the data would occupy a single column as an array. We don't have any
experience with this and were wondering if anyone here has tried it.
And finally, we're also trying out alternative stores, since it seems like
this data and its retrieval could be well suited to document-oriented
backends. Redis and DynamoDB are currently the best contenders.
Thanks in advance for any help,
Regards,
Dave Yeu & Neil Sarkar
GroupMe
From | Date | Subject | |
---|---|---|---|
Next Message | Ofer Israeli | 2012-02-08 19:20:22 | Re: Inserts or Updates |
Previous Message | Rural Hunter | 2012-02-08 14:36:30 | Re: index scan forward vs backward = speed difference of 357X slower! |