Quick Links

Re: slow select in big table

From:	Sam Mason <sam(at)samason(dot)me(dot)uk>
To:	pgsql-general(at)postgresql(dot)org
Subject:	Re: slow select in big table
Date:	2009-04-03 16:28:47
Message-ID:	20090403162847.GW12225@frubble.xen.chris-lamb.co.uk
Views:	Whole Thread \| Raw Message \| Download mbox \| Resend email
Thread:
Lists:	pgsql-general

On Fri, Apr 03, 2009 at 01:20:33AM -0700, rafalak wrote:
> QUERY PLAN without changes
> Aggregate (cost=98018.96..98018.97 rows=1 width=4) (actual time=64049.326..64049.328 rows=1 loops=1)
> -> Bitmap Heap Scan on tbl_photos_keywords (cost=533.23..97940.02 rows=31577 width=4) (actual time=157.787..63905.939 rows=119154 loops=1)
> Recheck Cond: (keyword_id = 14)
> -> Bitmap Index Scan on keyword_id (cost=0.00..525.33 rows=31577 width=0) (actual time=120.876..120.876 rows=119154 loops=1)
> Index Cond: (keyword_id = 14)

Pulling in 120k rows from disk is always going to take a long time.

> Total runtime: 64049.686 ms

If they're scattered randomly across the table it's going to take a
lot longer even than this; assuming a 8ms average seek time that's 30
minutes (two seeks for each row) and is why PG was preferring to pick a
bitmap heap scan over an index scan. The reason it only took one minute
is because the data isn't randomly distributed.

> QUERY PLAN with changes

> Aggregate (cost=30546.30..30546.31 rows=1 width=4) (actual time=1710.100..1710.102 rows=1 loops=1)
> -> Index Scan using keyword_id on tbl_photos_keywords (cost=0.00..30467.36 rows=31577 width=4) (actual time=0.150..1573.843 rows=119154 loops=1)
> Index Cond: (keyword_id = 14)

> Total runtime: 1710.185 ms

Is the cache hot now? i.e. did you run the tests in that order and hence
the first run pulled the data off disk and into memory where the second
test could use it without hitting the disk?

If you're running a recent Linux, you can tell it to drop the disk
cache:

http://linux-mm.org/Drop_Caches

If you do this and then restart PG (because PG has its own caches) you
should be able to see what's going on better.

> What else can be changed ?

Disks are slow and it's the seek time which is killing you; as your
dataset isn't too big you could upgrade the memory in the machine so the
entire table can be in memory at once and then you won't have a problem.

If you want to optimise this case you could also look at CLUSTERing this
table on the keyword.

--
Sam http://samason.me.uk/

In response to

Re: slow select in big table at 2009-04-03 08:20:33 from rafalak

Browse pgsql-general by date

	From	Date	Subject
Next Message	Erik Jones	2009-04-03 16:52:38	Re: How to capture an interactive psql session in a log file?
Previous Message	Mahlon E. Smith	2009-04-03 16:18:08	Re: How to capture an interactive psql session in a log file?