Quick Links

Re: Critical performance problems on large databases

From:	Stephan Szabo <sszabo(at)megazone23(dot)bigpanda(dot)com>
To:	Gunther Schadow <gunther(at)aurora(dot)regenstrief(dot)org>
Cc:	<pgsql-general(at)postgresql(dot)org>
Subject:	Re: Critical performance problems on large databases
Date:	2002-04-10 23:17:31
Message-ID:	20020410154437.W20046-100000@megazone23.bigpanda.com
Views:	Whole Thread \| Raw Message \| Download mbox \| Resend email
Thread:
Lists:	pgsql-general

On Wed, 10 Apr 2002, Gunther Schadow wrote:

> Off the bat, this indicates to me that there is something
> sub-obtimal about PostgreSQL handling simple queries. From
> a database that should perform well in online user transactions
> one would want the query processing to be streamed as much
> as possible, i.e., since in a SELECT * FROM Bigtable; there is
> no work needed other than to retrieve the tuples out of
> physical storage, the response should be immediate and resource
> usage low. There should not be large buffer allocations.
>
> Conversely it looks as if PostgreSQL will always read a sizeable
> piece (if not all?) of the result set into some buffer area before
> returning a single row. This would explain the slow startup on
> the SELECT * FROM Bigtable; query as well as the fact that
> COUNT(smallcolumn) behaves much faster than COUNT(*).

IIRC, the entire result set is sent across in the select
* from bigtable case, possibly to allow random access to
the result set? Not sure really.

The usual way to deal with these cases is to use limit/offset
or a cursor to fetch pieces of the data as you want them (ie:
begin;
DECLARE foo CURSOR FOR SELECT * FROM Bigtable;
FETCH 100 from foo;
FETCH 100 from foo;
...
end;
)

> Again, count should be streamed as well such as to use no
> significant memory resources other than the counter. Apparently
> a COUNT(*) in postgres is executed as
>
> SELECT * FROM Bigtable INTO $somebuffer
> COUNT(tuples in $somebuffer)

I believe the thing that takes a long time here is that it has to do a
sequential scan of Bigtable, which for large tables is rather time
consuming. I generally haven't seen large growth of backends
on moderate sized tables for such queries, although due to the
sequential scan I try to avoid count() whenever possible.

In response to

Critical performance problems on large databases at 2002-04-10 22:39:37 from Gunther Schadow

Browse pgsql-general by date

	From	Date	Subject
Next Message	Martijn van Oosterhout	2002-04-10 23:20:32	Re: Keyword position was: Why does this not work? (keyword 'TEXT')
Previous Message	Command Prompt, Inc.	2002-04-10 23:13:51	ANNOUNCE: Mammoth PostgreSQL