From: | Gunther Schadow <gunther(at)aurora(dot)regenstrief(dot)org> |
---|---|
To: | Gunther Schadow <gunther(at)aurora(dot)regenstrief(dot)org> |
Cc: | pgsql-general(at)postgresql(dot)org |
Subject: | Re: Critical performance problems on large databases |
Date: | 2002-04-11 16:09:05 |
Message-ID: | 3CB5B521.4040909@aurora.regenstrief.org |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-general |
Hi,
thanks everyone who responded. I want to set this whole COUNT(...)
issue aside for a moment because I have reported hearsay instead
of my own experiences. I apologize for muddying the waters with
that COUNT issue. But I still want to respond to the SELECT *
FROM Bigtable; issue once more.
[[[I have to say I am a bit confused by some of the responses who
basically shrugged the problem off sending the asker from anywhere
between back to the schoolbooks to the mental facility. That's not
necessary.]]]
I am delighted to hear that one respondent has no problem with a
2M table and snappy response, but unfortunately he didn't say much
about the detail, was that really select * from bigtable; queries
or do we have where clauses and stuff that limits the result set
considerably? A number of respondents confirmed my observations,
so I think the problem is real.
There was one remark about Perl or PHP always loading the complete
result set before returning. Bad for them. I don't use either and
I think it's just bad design to do that on the client but I don't
care about bad clients. I care about a good server.
The constructive responses suggested that I use LIMIT/OFFSET and
CURSORs. I can see how that could be a workaround the problem, but
I still believe that something is wrong with the PostgreSQL query
executer. Loading the entire result set into a buffer without
need just makes no sense. Good data base engines try to provide
for parallel execution of the query plan as much as possible, and
that implies streaming. There's a crowd of literature about this
testifying for it's importance.
The main reasons for this is (a) the use on multi-processor machines
where one CPU does one task and the other does another task on the
same query plan and the results from CPU 1 is streamed to CPU 2 (hey,
I have a 6 processor machine in my basement.) Perhaps more importantly
(b) buffering (without need) is inherently bad, because it wastes
memory resources leads to bursty demand on CPU and network, and slow
perceived response times. Buffering is a complete waste if the buffer
is being paged out to disk again and it isn't flexible or scaleable
if buffer pages are fixed into physical memory. Straming is
especially important if you want to do distributed joins (and though
pgsql doesn't support that yet it would be foolish to close your eyes
before a fundamental problem and then being forced to rework this in
a hurry when the time comes for distributed PostgreSQL.)
So, while my client application might benefit from such things as
cursors and OFFSET/LIMIT, the query planning and executing may
suffer from the buffering. And of course, the point is that it makes
sense to design the server such that streaming results to the
client is transparent because it automatically relieves the strain
on all resources, CPU, storage and network! Isn't that obvious?
regards
-Gunther
--
Gunther Schadow, M.D., Ph.D. gschadow(at)regenstrief(dot)org
Medical Information Scientist Regenstrief Institute for Health Care
Adjunct Assistant Professor Indiana University School of Medicine
tel:1(317)630-7960 http://aurora.regenstrief.org
From | Date | Subject | |
---|---|---|---|
Next Message | wsheldah | 2002-04-11 16:17:59 | Re: "NOT IN" predicate hangs result |
Previous Message | Rick Szeto | 2002-04-11 16:01:29 | "NOT IN" predicate hangs result |