Quick Links

Re: performance on selecting a row in large tables

From:	Tino Schwarze <postgresql(at)tisc(dot)de>
To:	pgsql-admin(at)postgresql(dot)org
Subject:	Re: performance on selecting a row in large tables
Date:	2008-02-04 21:46:53
Message-ID:	20080204214653.GC17045@easy2.in-chemnitz.de
Views:	Whole Thread \| Raw Message \| Download mbox \| Resend email
Thread:
Lists:	pgsql-admin

Hi Rainer,

On Tue, Feb 05, 2008 at 10:24:11AM +1300, Rainer Spittel wrote:

> You are right, this query is not the right approach for performance
> testing. I thought that this will give an indication about the
> performance of a select statement on that table.

Only do benchmarking with real-world queries. Or you'll end up
optimizing for artificial workloads which never occur.

> One of those slow queries are running on col02 which has a btree
> index. But I use the 'in' expression to get a set of matching rows:
>
> select * from table where col02 in ('...',[...],'...')
>
> This query gets sometimes really slow, I guess it depends on the size
> of the set used by 'in'.

How much in-clauses are there? It might be faster to use a subselect if
the in clauses come from DB anyway.

> Would the query perform better when I cluster the index on col02 and
> force to order the set for the in clause?

> Is there a way to disable the caching for testing? Once I ran the
> query, the result set seems to be cached and the second run of the query
> is fast. This makes a testing a little difficult ;-)

Use "EXPLAIN query" first to get an estimate how the query will perform.
Get used to reading EXPLAIN output. You already seem to have tuned the
DB to your server (setting memory sizes etc.), now optimize queries
first, then the DB server (look at the settings like random_page_cost,
effective_cache_size etc. if the planner performs sequential scans and
you think it should use indices).

If you really want to rule out caching, you'd need to keep trashing your
OS's cache, e.g. by running a "while dd if=/dev/some-drive of=/dev/null
bs=1m ; do : ; done" in the background. But again, you'll optimize for a
artifical workload - in real live, the cache will play a significant
role. There's no use in optimizing queries for no-cache scenario if in
practice, the cache will have the data. I know of real-live applications
where the first query takes some time and all subsequent are fast.
That's what caches are for.

HTH,

Tino.

--
www.craniosacralzentrum.de
www.spiritualdesign-chemnitz.de

Tino Schwarze * Lortzingstraße 21 * 09119 Chemnitz

In response to

Re: performance on selecting a row in large tables at 2008-02-04 21:24:11 from Rainer Spittel

Browse pgsql-admin by date

	From	Date	Subject
Next Message	Kevin Kempter	2008-02-04 22:00:27	last analyze / vacuum
Previous Message	Tom Lane	2008-02-04 21:43:20	Re: performance on selecting a row in large tables