Re: Need magical advice for counting NOTHING

From: Andreas <maps(dot)on(at)gmx(dot)net>
To: nha <lyondif02(at)free(dot)fr>
Cc: PgSQL-sql <pgsql-sql(at)postgresql(dot)org>
Subject: Re: Need magical advice for counting NOTHING
Date: 2009-07-25 20:46:30
Message-ID: 4A6B6F26.9040303@gmx.net
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-sql

nha schrieb:
> Hello,
>
> Le 23/07/09 10:23, Glenn Maynard a écrit :
>
>> On Thu, Jul 23, 2009 at 1:01 AM, Andreas<maps(dot)on(at)gmx(dot)net> wrote:
>>
>>> SELECT user_name, log_type_fk, COUNT(log_type_fk)
>>> FROM log
>>> JOIN users ON (user_id = user_fk)
>>> WHERE (ts IS BETWEEN sometime AND another)
>>> GROUP BY user_name, log_type_fk
>>> ORDER BY user_name, log_type_fk
>>>
>> [...]
>> SELECT user_name, log_type.log_type, sum((log_type_fk IS NOT
>> NULL)::integer) AS count
>> FROM users JOIN log_type ON (true) LEFT JOIN log ON (user_id = user_fk
>> AND log.log_type_fk = log_type.log_type)
>> GROUP BY user_name, log_type.log_type
>> ORDER BY user_name, log_type.log_type;
>> [...]
>>
>
> In the same direction as Andreas Krestchmer and Glenn Maynard (thanks to
> the latter for DDL assumed statements), the following query should also
> suit:
>
> SELECT user_name, log_type, COUNT(log_type_fk)
> FROM (users CROSS JOIN log_type)
> LEFT JOIN log ON (user_id = user_fk AND log_type_id = log_type_fk)
> WHERE (ts IS BETWEEN sometime AND another)
> GROUP BY user_name, log_type
> ORDER BY user_name, log_type
>
I tried it and found it looses the COUNT() = 0 lines because of the date
filtering at this position.
ts is a columns of log. The tricky lines are those log_types that aren't
in log so (ts IS NULL) here and NULL can't be compared to timestamps so
the interesting lines get filtered out of the LEFT JOIN's result and
can't be counted.

The folowing aproach works AFAIK.

SELECT user_name, log_type_id, COUNT(log_type_fk)
FROM (users CROSS JOIN log_type)
LEFT JOIN
(
SELECT user_fk, log_type_fk
FROM log
WHERE (ts BETWEEN sometime AND another)
) AS x
ON (user_id = user_fk AND log_type_id = log_type_fk)
GROUP BY user_name, log_type_id
ORDER BY user_name, log_type_id

This subselect gives a really considerable speed up, too.

> While it is formally assumed that user_id and log_type_id are
> respectively keys for users and log_type tables, it is semantically
> admitted here that user_name identifies user_id in users table and
> log_type identifies log_type_id in log_type table.
Actually I didn't consider this key issue, yet. :}
But, as the result gets stuffed in the crosstab function, the formally strict key log_type_id would do as column 2 as well, or maybe better being a number instead of a varchar.
I'll even change the first column to user_id and wrap another JOIN users around so I don't get messed up by users with the same name.

Thank you and Andreas Krestchmer and Glenn Maynard for giving me a new view on things.
:)

In response to

Browse pgsql-sql by date

  From Date Subject
Next Message Glenn Maynard 2009-07-26 21:21:50 Re: Bit by "commands ignored until end of transaction block" again
Previous Message nha 2009-07-25 00:26:38 Re: Need magical advice for counting NOTHING