From: | Martijn van Oosterhout <kleptog(at)svana(dot)org> |
---|---|
To: | Jan Wieck <janwieck(at)yahoo(dot)com> |
Cc: | asdf asdasfa <sjg(at)email(dot)com>, pgsql-general(at)postgresql(dot)org |
Subject: | Re: Performance Issues with count() |
Date: | 2002-04-23 23:53:37 |
Message-ID: | 20020424095337.A17391@svana.org |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-general |
On Tue, Apr 23, 2002 at 07:25:34PM -0400, Jan Wieck wrote:
> In particular, the Perl script uses a hash table with one
> entry for each group. Now what happens if the input data is
> spread out and contains 1G groups? It'll simply blow away
> your script because it runs out of memory. This behaviour is
> unacceptable for a database system, so as you see in the
> Explain output, PostgreSQL sorts and groups the input data in
> temporary files before counting the rows. Due to that, the
> PostgreSQL solution to the problem requires a gazillion of IO
> operations, but it'll work whatever the input data is, given
> that there is enough disk space.
I've always had a little problem with this setup. Sure, if you're using 1
billion groups then you have a problem, but if you are going into only a
small number of groups, the sorting takes forever compared to using a hash.
Currently the planner beleives that the output of a group will be 1/10th of
the input, but I think the square root would be a better estimate.
Here we regularly summerise 2 million rows into 6 groups and it's almost
faster to do 6 sequential scans than it is to do the sort/group.
Besides, the running out of memory argument is silly, because if sorting
moves out to disk when it gets too big, you can do the same with a hash
table.
--
Martijn van Oosterhout <kleptog(at)svana(dot)org> http://svana.org/kleptog/
> Canada, Mexico, and Australia form the Axis of Nations That
> Are Actually Quite Nice But Secretly Have Nasty Thoughts About America
From | Date | Subject | |
---|---|---|---|
Next Message | Bruce Momjian | 2002-04-23 23:56:11 | Re: PgAccess patching |
Previous Message | Bruce Momjian | 2002-04-23 23:52:50 | Re: nested transactions |