Re: Performance of count(*)

From: Steve Atkins <steve(at)blighty(dot)com>
To: PostgreSQL Performance <pgsql-performance(at)postgresql(dot)org>
Subject: Re: Performance of count(*)
Date: 2007-03-22 18:37:19
Message-ID: 79B982A3-9C3B-47D6-896E-F6553004ECE1@blighty.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-performance


On Mar 22, 2007, at 11:26 AM, Guido Neitzer wrote:

> On 22.03.2007, at 11:53, Steve Atkins wrote:
>
>> As long as you're ordering by some row in the table then you can
>> do that in
>> straight SQL.
>>
>> select a, b, ts from foo where (stuff) and foo > X order by foo
>> limit 10
>>
>> Then, record the last value of foo you read, and plug it in as X
>> the next
>> time around.
>
> This does only work if you have unique values in foo. You might
> have "batch breaks" inside a list of rows with equal values for foo.

If I don't have unique values in foo, I certainly have unique values
in (foo, pk).

>
> But: a web application that needs state and doesn't maintain it by
> itself (or inside the dev toolkit) is imho broken by design. How
> should the database store a "state" for a web app? It's only
> possible on the web app part, because the app is either stateless
> and so are the queries to the database - they have to be re-
> evaluated for every request as the request might come from totally
> different sources (users, ...) or it is stateful and has to
> maintain the state because only the app developer knows, what
> information is needed for the "current state".
>
> This is why all web application toolkits have a "session" concept.

Yes. HTTP is not very stateful. Web applications are stateful. There
are some really obvious approaches to maintaining state cleanly that
work well with databases and let you do some quite complex stuff
(tying a persistent database connection to a single user, for
instance). But they don't scale at all well.

What Craig was suggesting is, basically, to assign a persistent
database connection to each user. But rather than maintain that
connection as a running process, to serialise all the state out of
the database connection and store that in the webapp, then when the
next action from that user comes in take a database connection and
stuff all that state into it again.

It's a lovely idea, but strikes me as completely infeasible in the
general case. There's just too much state there. Doing it in the
specific case is certainly possible, but rapidly devolves to the
standard approach of "On the first page of results, run the query and
record the first 5000 results. Store those in a scratch table,
indexed by session-id, or in external storage. On displaying later
pages of results to the same user, pull directly from the already
calculated results."

>
>> I think the problem is more that most web developers aren't very good
>> at using the database, and tend to fall back on simplistic, wrong,
>> approaches
>> to displaying the data. There's a lot of monkey-see, monkey-do in web
>> UI design too, which doesn't help.
>
> Sure. That is the other problem ... ;-) But, and I think this is
> much more important: most toolkits today free you from using the
> database directly and writing lots and lots of lines of sql code
> which instantly breaks when you switch the storage backend. It's
> just the thing from where you look at something.

The real problem is the user-interface is designed around what is
easy to implement in elderly cgi scripts, rather than what's
appropriate to the data being displayed or useful to the user.
Displaying tables of results, ten at a time, is just one of the more
egregious examples of that.

Cheers,
Steve

In response to

Browse pgsql-performance by date

  From Date Subject
Next Message Tom Lane 2007-03-22 18:37:26 Re: Performance of count(*)
Previous Message Guido Neitzer 2007-03-22 18:26:12 Re: Performance of count(*)