Re: 7.3.1 New install, large queries are slow

From: Stephan Szabo <sszabo(at)megazone23(dot)bigpanda(dot)com>
To: Roman Fail <rfail(at)posportal(dot)com>
Cc: Josh Berkus <josh(at)agliodbs(dot)com>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, <pgsql-performance(at)postgresql(dot)org>
Subject: Re: 7.3.1 New install, large queries are slow
Date: 2003-01-16 17:02:38
Message-ID: 20030116085358.E5729-100000@megazone23.bigpanda.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-performance


On Thu, 16 Jan 2003, Roman Fail wrote:

> ***********************

Hmm, I wonder if maybe we're going about things backwards in this
case. Does the original database have something like EXPLAIN
that'll show what it's doing? Perhaps that'll give an idea.

> > What does vacuum verbose batchdetail give you (it'll give an idea of pages anyway)
>
> trans=# VACUUM VERBOSE batchdetail;
> INFO: --Relation public.batchdetail--
> INFO: Pages 1669047: Changed 0, Empty 0; Tup 23316674: Vac 0, Keep 0, UnUsed 0.

So about 12 gigabytes of data, then?

> It seems to me that the big, big isolated problem is the index scan on
> batchdetail.tranamount. During this small query, 'sar -b' showed
> consistent 90,000 block reads/sec. (contrast with only 6,000 with
> larger query index scan). 'top' shows the CPU is at 20% user, 30%
> system the whole time (contrast with 2% total in larger query above).

Note that in this case below, you've gotten a sequence scan not an
index scan. (similar to setting enable_indexscan=off performance)

> This results here still seem pretty bad (although not as bad as
> above), but I still don't know what is the bottleneck. And the
> strange sar stats are confusing me.
>
> EXPLAIN ANALYZE SELECT * FROM batchdetail WHERE tranamount BETWEEN 300 AND 499;
> Seq Scan on batchdetail (cost=0.00..2018797.11 rows=783291 width=440) (actual time=45.66..283926.58 rows=783687 loops=1)
> Filter: ((tranamount >= 300::numeric) AND (tranamount <= 499::numeric))
> Total runtime: 285032.47 msec

I'd assume that tranamount values are fairly randomly distributed
throughout the table, right? It takes about 5 minutes for the
system to read the entire table and more for the index scan, so
you're probably reading most of the table randomly and the index
as well.

What values on batchdetail do you use in query where clauses
regularly? It's possible that occasional clusters would help
if this was the main field you filtered on. The cluster
itself is time consuming, but it might help make the index
scans actually read fewer pages.

In response to

Browse pgsql-performance by date

  From Date Subject
Next Message Josh Berkus 2003-01-16 17:16:33 Re: 7.3.1 New install, large queries are slow
Previous Message Andrew Sullivan 2003-01-16 16:46:18 Re: schema/db design wrt performance