Re: Slow count(*) again...

From: Neil Whelchel <neil(dot)whelchel(at)gmail(dot)com>
To: Joe Uhl <joeuhl(at)gmail(dot)com>, "pgsql-performance(at)postgresql(dot)org" <pgsql-performance(at)postgresql(dot)org>
Subject: Re: Slow count(*) again...
Date: 2010-10-12 22:21:31
Message-ID: 201010121521.32086.neil.whelchel@gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers pgsql-performance

On Tuesday 12 October 2010 07:19:57 you wrote:
> >> The biggest single problem with "select count(*)" is that it is
> >> seriously overused. People use that idiom to establish existence, which
> >> usually leads to a performance disaster in the application using it,
> >> unless the table has no more than few hundred records. SQL language, of
> >> which PostgreSQL offers an excellent implementation, offers [NOT]
> >> EXISTS clause since its inception in the Jurassic era. The problem is
> >> with the sequential scan, not with counting. I'd even go as far as to
> >> suggest that 99% instances of the "select count(*)" idiom are probably
> >> bad use of the SQL language.
> >
> > I agree, I have seen many very bad examples of using count(*). I will go
> > so far as to question the use of count(*) in my examples here. It there
> > a better way to come up with a page list than using count(*)? What is
> > the best method to make a page of results and a list of links to other
> > pages of results? Am I barking up the wrong tree here?
>
> One way I have dealt with this on very large tables is to cache the
> count(*) at the application level (using memcached, terracotta, or
> something along those lines) and then increment that cache whenever you
> add a row to the relevant table. On application restart that cache is
> re-initialized with a regular old count(*). This approach works really
> well and all large systems in my experience need caching in front of the
> DB eventually. If you have a simpler system with say a single
> application/web server you can simply store the value in a variable, the
> specifics would depend on the language and framework you are using.

I use this method when ever possible. I talked about it in my first post.
I generally keep a table around I call counts. It has many rows that store
count numbers from frequently used views.
The one that I can't do anything about is the case where you nave no control
over the WHERE clause, (or where there may be simply too many options to count
everything ahead of time without making things even slower). That is the point
of this entire thread, or was... ;)
-Neil-

>
> Another more all-DB approach is to create a statistics tables into which
> you place aggregated statistics rows (num deleted, num inserted, totals,
> etc) at an appropriate time interval in your code. So you have rows
> containing aggregated statistics information for the past and some tiny
> portion of the new data happening right now that hasn't yet been
> aggregated. Queries then look like a summation of the aggregated values
> in the statistics table plus a count(*) over just the newest portion of
> the data table and are generally very fast.
>
> Overall I have found that once things get big the layers of your app
> stack start to blend together and have to be combined in clever ways to
> keep speed up. Postgres is a beast but when you run into things it
> can't do well just find a way to cache it or make it work together with
> some other persistence tech to handle those cases.

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Tom Lane 2010-10-12 22:21:32 Re: Review: Fix snapshot taking inconsistencies
Previous Message Bruce Momjian 2010-10-12 22:03:27 Re: [GENERAL] Text search parser's treatment of URLs and emails

Browse pgsql-performance by date

  From Date Subject
Next Message Mladen Gogala 2010-10-12 22:30:38 Re: Slow count(*) again...
Previous Message Pierre C 2010-10-12 21:35:01 Re: Slow count(*) again...