Select Last n Rows Matching an Index Condition (and caches)

From: "Alex Stapleton" <alexs(at)advfn(dot)com>
To: <pgsql-general(at)postgresql(dot)org>
Subject: Select Last n Rows Matching an Index Condition (and caches)
Date: 2005-03-18 09:29:06
Message-ID: CMEKJGNLDMNLHLKAEDDEKELLHHAA.alexs@advfn.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-general

We have a ~10million row table but are expecting it to get larger, possibly
by a factor of 10 or more. The rows are quite long and fixed length (just
over 500 bytes.)

We have an index of (symbol, source, date) on this table and doing queries
like this

SELECT * FROM article WHERE symbol=12646 AND source = 19 ORDER BY time DESC
LIMIT 1000;

To get the latest 1000 rows for that symbol and source.

However this takes quite a while at the best of times, (1-10 seconds.) The
query without the order by and the limit tends to return about 70000 rows
which adds up to about 30MB of data. Once the pages are in the cache they
take around 100ms but this is to be expected. Unfortunately the initial
query required to cache it is unnacceptably long for web application like
ours.

My (not yet implemented) solution to this problem is to add a SEQUENCE and
index it so that by adding a WHERE id > [max_id]-1000 and ordering by time
DESC will reduce the I/O quite a lot. Am I right here? It would be nice if
there was a way to get PostgreSQL to try and precache the tables pages as
well, is there anyway I could achieve something like that? I have toyed with
creating a ramdisk to store a lookup table of sorts on (we only care about a
few columns initially) to speed this up a bit but its a right pain in the
arse to do by the looks of things.

Responses

Browse pgsql-general by date

  From Date Subject
Next Message Alban Hertroys 2005-03-18 09:47:42 Re: Select Last n Rows Matching an Index Condition (and
Previous Message Sarah Ewen 2005-03-18 09:19:33 pg_dump fails with socket_not_open