From: | "Christopher Kings-Lynne" <chriskl(at)familyhealth(dot)com(dot)au> |
---|---|
To: | "Fabien COELHO" <coelho(at)cri(dot)ensmp(dot)fr>, "PostgreSQL Developers" <pgsql-hackers(at)postgresql(dot)org> |
Subject: | Re: suggestions to improve postgresql suitability for data-mining |
Date: | 2003-07-23 01:33:54 |
Message-ID: | 0efc01c350ba$7ad71140$2800a8c0@mars |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
> II) SQL
> -------
>
> The first idea is to ask SQL to do the job with a 'group by' clause:
>
> SELECT area, type, month, SUM(amount), COUNT(*)
> FROM client AS c, invoice AS i
> WHERE c.id=i.client
> GROUP BY area, type, month;
>
> As I am just interested in reading the data, without any transaction, I
> tuned a little bit the database parameters (fsync=false, more shared_mem
> and sort_mem).
>
> It works, but it is quite slow and it requires a lot of disk space.
> Indeed, the result of the join is big, and the aggregation seems to
> require an external sort step so as to sum up data one group after the
> other.
>
> As the resulting table is very small, I wish the optimizer would have
> skipped the sort phase, so as to aggregate the data as they come after the
> join. All may be done on the fly without much additionnal storage (well,
> with some implementation efforts). Maybe it is the "hash evaluation of
> group by aggregates" item listed in the todo list.
As of 7.4CVS it does do this. You will find this much faster in 7.4
release.
Chris
From | Date | Subject | |
---|---|---|---|
Next Message | Bruce Momjian | 2003-07-23 03:01:11 | Re: reprise on Linux overcommit handling |
Previous Message | Christopher Kings-Lynne | 2003-07-23 01:18:53 | Re: php with postgres |