David Yeu <david(dot)yeu(at)skype(dot)net> wrote:
> We have indices against the primary key and the group_id.
> Our queries essentially fall into the following cases:
>
> * Š WHERE group_id = ? ORDER BY created_at DESC LIMIT 20;
> * Š WHERE group_id = ? AND id > ? ORDER BY created_at DESC;
> * Š WHERE group_id = ? AND id < ? ORDER BY created_at DESC LIMIT
> 20;
> * Š WHERE group_id = ? ORDER BY created_at DESC LIMIT 20 OFFSET
> ?;
>
> In human words, we're looking for:
>
> * The most recent (20) rows.
> * The most recent rows after a given `id'.
> * Twenty rows before a given `id'.
> * Pages of twenty rows.
The first thing I would try is building an index (perhaps
CONCURRENTLY to avoid disrupting production) on (group_id,
created_at). It might also be worth creating an index on (group_id,
id, created_at), but that's a less-sure win.
> Originally, this table was part of our primary database, but
> recently we saw queries take upwards of thirty seconds or more to
> complete. Since we're serving web requests, that's basically
> unacceptable, and caused a lot of requests to backup.
With only the indexes you mention, it had to be doing either
complete table scans for each request, or a lot of random access to
rows it didn't need.
> Our interim solution has been to simply carve out a new database
> that hosts only this table, and that has worked to some degree. We
> aren't seeing thirty seconds plus database response times anymore,
> but some queries still take many seconds and the cost of spinning
> up a new master-slave configuration hasn't been cheap.
Well, throwing hardware at something doesn't generally hurt, but
it's not the first solution I would try, especially when the product
you're using has ways to tune performance.
> In the meantime, we're hoping to investigate other ways to
> optimize this table and the queries against it. Heroku's data team
> has suggested balling up these rows into arrays, where a single
> row would represent a group_id, and the data would occupy a single
> column as an array.
Ugh. You're a long way from needing to give up on the relational
model here.
> And finally, we're also trying out alternative stores, since it
> seems like this data and its retrieval could be well suited to
> document-oriented backends. Redis and DynamoDB are currently the
> best contenders.
Your current use of PostgreSQL is more or less equivalent to driving
a car around in first gear. You might consider a tuned PostgreSQL
as another alternative store. :-)
-Kevin