Quick Links

Re: performance of count(*)

From:	Tomas Vondra <tv(at)fuzzy(dot)cz>
To:	pgsql-general(at)postgresql(dot)org
Subject:	Re: performance of count(*)
Date:	2011-05-06 19:39:55
Message-ID:	4DC44E8B.5040207@fuzzy.cz
Views:	Whole Thread \| Raw Message \| Download mbox \| Resend email
Thread:
Lists:	pgsql-general

Dne 6.5.2011 20:45, Scott Ribe napsal(a):
> I need to optimize queries that deal with some aggregates regarding
> resource availability. My specific problem is, I think, very closely
> analogous to select count(*)... where...
>
> I know roughly how to do it, aggregated stats table, triggers
> appending to it, occasional updates to coalesce entries. I'd just like
> to see an example to confirm my own plan and see if I'm missing any
> details.
>
> I'm sure I've seen references to articles on ways to do this, but all
> google is getting me is generic complaints about count(*) performance
> and suggestions to use stats for estimated total rows in a table,
> nothing useful for this.
>

Well I guess you got most of that right - just create a table to hold
aggregated values, and then a bunch of triggers to update it. The
structute and implementation of the triggers really depend on your
needs, but in general there are two approaches - eager and lazy.

Eager - the triggers immediately update the aggregates (increment a
count, add a value to the sum etc.). This is a 'primitive' and less
complex solution, suitable when the update is cheap and the aggregated
table is often read.

Lazy - just mark the row in the aggregated table as 'dirty' and
recompute it only if it's read. This is suitable if the table is read
only occasionaly and/or when computing the aggregate is complex (e.g.
needs to reread the whole dataset - as for example computing variance).

The lazy approach is not that usual, a great example how to implement
that is available here:

http://www.pgcon.org/2008/schedule/events/69.en.html

Anyway I'd recommend to start with the eager approach, it's much easier
to implement. You can implement the lazy approach later, if you find out
it's needed.

And you should strive to use HOT feature (if you're on >= 8.4),
especially with the eager approach - it often does a lot of updates and
leads to bloat of the aggregated table. So decrease the fillfactor and
do not index the columns that are updated by the triggers.

regards
Tomas

In response to

performance of count(*) at 2011-05-06 18:45:23 from Scott Ribe

Responses

Re: performance of count(*) at 2011-05-06 21:43:02 from Scott Ribe

Browse pgsql-general by date

	From	Date	Subject
Next Message	Andrew Sullivan	2011-05-06 19:59:02	Re: performance of count(*)
Previous Message	Scott Ribe	2011-05-06 18:45:23	performance of count(*)