Re: Speedier count(*)

From: Mark Cotner <mcotner(at)yahoo(dot)com>
To: Gavin Sherry <swm(at)alcove(dot)com(dot)au>, Dan Harris <fbsd(at)drivefaster(dot)net>
Cc: <pgsql-performance(at)postgresql(dot)org>
Subject: Re: Speedier count(*)
Date: 2005-08-11 04:40:23
Message-ID: BF204EF7.D9F6%mcotner@yahoo.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-performance

Here's a trigger I wrote to perform essentially the same purpose. The nice
thing about this is it keeps the number up to date for you, but you do incur
slight overhead.

CREATE TABLE test (id serial primary key, name varchar(20));

CREATE TABLE rowcount (tablename varchar(50), rowcount bigint default 0);
CREATE INDEX rowcount_tablename ON rowcount(tablename);

CREATE OR REPLACE FUNCTION del_rowcount() RETURNS trigger AS $$
BEGIN
UPDATE rowcount SET rowcount = rowcount-1 WHERE tablename = TG_RELNAME;
RETURN OLD;
END;
$$ LANGUAGE PLPGSQL;

CREATE OR REPLACE FUNCTION add_rowcount() RETURNS trigger AS $$
BEGIN
UPDATE rowcount SET rowcount = rowcount+1 WHERE tablename = TG_RELNAME;
RETURN NEW;
END;
$$ LANGUAGE PLPGSQL;

CREATE TRIGGER del_rowcount_tr BEFORE DELETE ON test FOR EACH ROW EXECUTE
PROCEDURE del_rowcount();
CREATE TRIGGER add_rowcount_tr BEFORE INSERT ON test FOR EACH ROW EXECUTE
PROCEDURE add_rowcount();

INSERT INTO rowcount (tablename) VALUES ('test');

root=# select * from test;
id | name
----+------
(0 rows)

Time: 0.934 ms
root=# select * from rowcount;
tablename | rowcount
-----------+----------
test | 0
(1 row)

Time: 0.630 ms
root=# insert into test (name) values ('blah');
INSERT 1190671626 1
Time: 3.278 ms
root=# select * from test;
id | name
----+------
5 | blah
(1 row)

Time: 0.612 ms
root=# select * from rowcount;
tablename | rowcount
-----------+----------
test | 1
(1 row)

Time: 0.640 ms
root=# insert into test (name) values ('blah');
INSERT 1190671627 1
Time: 1.677 ms
root=# select * from test;
id | name
----+------
5 | blah
6 | blah
(2 rows)

Time: 0.653 ms
root=# select * from rowcount;
tablename | rowcount
-----------+----------
test | 2
(1 row)

Time: 0.660 ms
root=# delete from test where id = 6;
DELETE 1
Time: 2.412 ms
root=# select * from test;
id | name
----+------
5 | blah
(1 row)

Time: 0.631 ms
root=# select * from rowcount;
tablename | rowcount
-----------+----------
test | 1
(1 row)

Time: 0.609 ms

One thing to be mindful of . . . Truncate is NOT accounted for with this,
and unfortunately the rule system doesn't allow truncate operations so you
can't work around it that way.

'njoy,
Mark

On 8/10/05 11:52 PM, "Gavin Sherry" <swm(at)alcove(dot)com(dot)au> wrote:

> Hi Dan,
>
> On Wed, 10 Aug 2005, Dan Harris wrote:
>
>> I have a web page for my customers that shows them count of records
>> and some min/max date ranges in each table of a database, as this is
>> how we bill them for service. They can log in and check the counts
>> at any time. I'd like for the counts to be as fresh as possible by
>> keeping this dynamic, but I will use a periodic 'snapshot'/cron job
>> if that is the only option to speed this up. I have thought about
>> using the table statistics, but the estimate error is probably
>> unacceptable because of the billing purposes.
>>
>> For some reason, the SQL Server we migrated the app from can return
>> count(*) in a split second on multi-million row tables, even though
>> it is a MUCH slower box hardware-wise, but it's now taking many
>> seconds to run. I have read in the archives the problems MVCC brings
>> into the count(*) dilemma forcing Pg to run a seq scan to get
>> counts. Does SQLServer not use MVCC or have they found another
>
> SQL Server probably jumps through a lot of hoops to do fast count(*)s. I'm
> sure we could do something similar -- it's just a question of complexity,
> resources, desirability, etc. The are other solutions, which makes the
> idea of doing it less attractive still.
>
>> approach for arriving at this number? Compounding all the min/max
>> and counts from other tables and all those queries take about a
>> minute to run. The tables will contain anywhere from 1 million to 40
>> million rows.
>>
>> Also, I am using "select ... group by ... order by .. limit 1" to get
>> the min/max since I have already been bit by the issue of min() max()
>> being slower.
>
> I generally pre generate the results. There are two ways to do this: the
> 'snapshot'/cronjon you mentioned or using rules and triggers to maintain
> 'count' tables. The idea is that if data is added, modified or removed
> from your table, you modify counters in these other tables.
>
> Alternatively, feel free to post your schema and sample queries with
> explain analyze results to this list. Alternatively, jump on irc at
> irc.freenode.net #postgresql and someone will be more than happy to look
> through the problem in more detail.
>
> Thanks,
>
> Gavin
>
> ---------------------------(end of broadcast)---------------------------
> TIP 1: if posting/reading through Usenet, please send an appropriate
> subscribe-nomail command to majordomo(at)postgresql(dot)org so that your
> message can get through to the mailing list cleanly

In response to

Responses

Browse pgsql-performance by date

  From Date Subject
Next Message Qingqing Zhou 2005-08-11 06:37:20 Re: it is always delete temp table will slow down the postmaster?
Previous Message Gavin Sherry 2005-08-11 03:52:04 Re: Speedier count(*)