Re: *sigh*

From: Randolf Richardson <rr(at)8x(dot)ca>
To: pgsql-hackers(at)postgresql(dot)org
Subject: Re: *sigh*
Date: 2003-12-12 19:42:23
Message-ID: Xns944F74FB0C557rr8xca@200.46.204.72
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

"Christopher Browne <cbbrowne(at)acm(dot)org>" wrote in
comp.databases.postgresql.hackers:

> Martha Stewart called it a Good Thing when Randolf Richardson <rr(at)8x(dot)ca>
> wrote:
[sNip]
>> What about queueing all these updates for a separate
>> low-priority thread? The thread would be the only one with access
>> to update this field.
>
> If updates are "queued," then how do you get to use them if the
> "update thread" isn't running because it's not high enough in
> priority?

That would be an administration issue. This background process would
need to have enough priority in order for this to be functional, yet could
also be completely disabled by administrators who know their systems don't
need to use "count(*)" at all.

Also, if the thread was well-designed, then it could combine all the
queued entries for a single table first in order to reduce disk I/O when
updating each table.

> I am not being facetious.

Oh, I see that. Don't worry, I know better than to take things
personally on newsgroups -- go ahead and be blunt if you like. =D

> The one way that is expected to be successful would be to have a
> trigger that, upon seeing an insert of 5 rows to table "ABC", puts,
> into table "count_detail", something like:
>
> insert into count_detail (table, value) values ('ABC', 5);
>
> You then replace
> select count(*) from abc;
>
> with
> select sum(value) from count_detail where table = 'ABC';
>
> The "low priority" thread would be a process that does something akin
> to vacuuming, where it would replace the contents of the table every
> so often...
>
> for curr_table in (select table from count_detail) do
> new_total = select sum(value) from count_detail
> where table = curr_table;
> delete from count_detail where table = curr_table;
> insert into count_detail (table, value) values (curr_table,
> new_total);
> done
>
> The point of this being to try to keep the number of rows to 1 per
> table.

Interesting. A different way of solving the same problem, but
wouldn't it require more disk I/O on the table being updated then a
separate tracker would?

> Note that this gets _real_ expensive for tables that see lots of
> single row inserts and deletes. There isn't a cheaper way that will
> actually account for the true numbers of records that have been
> committed.
>
> For a small table, it will be cheaper to walk through and calculate
> count(*) directly from the tuples themselves.
>
> The situation where it may be worthwhile to do this is a table which
> is rather large (thus count(*) is expensive) where there is some
> special reason to truly care how many rows there are in the table.
> For _most_ tables, it seems unlikely that this will be true. For
> _most_ tables, it is absolutely not worth the cost of tracking the
> information.

Ah, but that's the point -- do we truly care how many rows are in the
table, or is the purpose of "count(*)" to just give us a general idea?

This statistic would be delayed because it's being updated by a
background process, thus "count" won't always be accurate, but at least it
won't be slow -- it could be the fastest "count" in the industry! =)

--
Randolf Richardson - rr(at)8x(dot)ca
Vancouver, British Columbia, Canada

Please do not eMail me directly when responding
to my postings in the newsgroups.

In response to

  • Re: *sigh* at 2003-12-03 05:55:16 from Christopher Browne

Browse pgsql-hackers by date

  From Date Subject
Next Message Randolf Richardson 2003-12-12 19:43:55 Re: *sigh*
Previous Message Peter Eisentraut 2003-12-12 19:41:48 Re: Resurrecting pg_upgrade