Re: count(*) and bad design was: Experiences with extensibility

From: Sim Zacks <sim(at)compulab(dot)co(dot)il>
To: pgsql-general(at)postgresql(dot)org
Subject: Re: count(*) and bad design was: Experiences with extensibility
Date: 2008-01-10 07:06:50
Message-ID: fm4gal$2rhr$1@news.hub.org
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-general

It would be an administrative nightmare unless you had very few where clauses
that you were tracking.

Instead of using a trigger, you could use Listen/Notify to call a daemon on the
server to run the procedure and then you have no insert/delete overhead.
Or you could call the function on a cron job every 10 minutes...

Chris Browne wrote:
> zb(at)cybertec(dot)at (Zoltan Boszormenyi) writes:
>> which will be fast and depending on the initial value of COUNT(*)
>> it will be very close to the exact figure. You can extend the example
>> with more columns if you know your SELECT COUNT(*) ... WHERE
>> conditions in advance but this way you have to keep several administrative
>> tables for different monitored tables. Again, this trades some disk space
>> and INSERT/DELETE operation speed on the monitored tables for
>> quicker count.
>
> Actually, this approach will be Really Terrible for any cases where
> multiple connections are adding/deleting tuples concurrently, as it
> will force ALL updates to serialize behind the update to the central
> table.
>
> Occasionally, you'll have something even worse, namely a deadlock,
> where two or more of the updates fighting over the single summary
> tuple fall into a bad state, and one of them is forced to give up,
> potentially rolling back its whole transaction.
>
> [Waving hands for a moment]
>
> What I would do *instead* would be for each INSERT to add a tuple with
> a count of 1, and for each DELETE to add a tuple with count of -1, and
> then to periodically have a single process walk through to summarize
> the table. There may be a further optimization to be had by doing a
> per-statement trigger that counts the number of INSERTs/DELETEs done,
> so that inserting 30 tuples (in the table being tracked) leads to
> adding a single tuple with count of 30 in the summary table.
>
> That changes the tradeoffs, again...
>
> - Since each INSERT/DELETE is simply doing an INSERT into the summary
> table, the ongoing activity is *never* blocking anything
>
> - You get the count by requesting
> SELECT SUM(rec_cnt) as record_count from rec_count where tablename = 'foo';
>
> - Note that the query is MVCC-consistent with the table!
>
> - Once in a while, you'll want to run a single procedure that, for
> each table, deletes all the existing records, and replaces them
> with a single one consisting of the sum of the individual values.
>
> - You can re-sync a table by running the query:
> begin;
> delete from record_count where tablename = 'foo';
> insert into record_count(tablename, rec_cnt) select 'foo', (select count(*) from foo);
> commit;

In response to

Browse pgsql-general by date

  From Date Subject
Next Message Ken Johanson 2008-01-10 07:21:34 Re: Query to get column-names in table via PG tables?
Previous Message Gábor Farkas 2008-01-10 06:52:08 Re: vacuum, dead rows, usual solutions didn't help