Re: Indices for select count(*)?

From: "Jim C(dot) Nasby" <jnasby(at)pervasive(dot)com>
To: Greg Stark <gsstark(at)mit(dot)edu>, Alexander Scholz <alexander(dot)scholz1(at)freenet(dot)de>, pgsql-general(at)postgresql(dot)org
Subject: Re: Indices for select count(*)?
Date: 2005-12-22 15:33:25
Message-ID: 20051222153325.GU72143@pervasive.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-general

On Thu, Dec 22, 2005 at 04:10:50PM +0100, Martijn van Oosterhout wrote:
> Actually, ISTM the trend is going the other way. MySQL has instant
> select count(*), as long as you're only using ISAM. Recent versions of

No comment.

> MSSQL use an MVCC type system and it also scans the whole table. Oracle
> is the only one I've found that has any optimisation on this front.

I think this is more an indication of the power of MVCC over traditional
locking rather than the importance of indexes covering (reading just an
index to satisfy a query). Index covering can be a huge benefit, and I'd
be surprised if MS didn't come out with some way to do it in a future
version. I'm actually a bit surprised they don't do it in SQL2005.

> The thing is, it *is* possible to change PostgreSQL to do counts via
> the index. The problem is, the cost is high enough that we're
> reasonably sure most people don't want to pay it. I've neverneeded an
> exact row count of a large table (estimates are good enough) so I'm not
> sure I'd be willing to pay a price to have it.

I didn't think the method of adding the imperfect known_visible bit to
the indexes had that much overhead, but it's been a while since those
discussions took place. I do recall some issue being raised that will be
very difficult to solve (though again I don't remember the details now).

I agree that SELECT count(*) FROM table; is a pretty bogus use case.
SELECT count(*) FROM table WHERE field = blah; isn't though, and people
often depend on that being extremely fast. When you can do index
covering, that case usually is very fast, and PostgreSQL can be much
slower. Of course, there are ways around that, but it's more work (and
something that I'd bet most developers wouldn't think of).
--
Jim C. Nasby, Sr. Engineering Consultant jnasby(at)pervasive(dot)com
Pervasive Software http://pervasive.com work: 512-231-6117
vcard: http://jim.nasby.net/pervasive.vcf cell: 512-569-9461

In response to

Responses

Browse pgsql-general by date

  From Date Subject
Next Message Pete Deffendol 2005-12-22 15:38:46 Sorting array field
Previous Message Jim C. Nasby 2005-12-22 15:22:55 Re: Questions about server.