Re: count(*) vs count(id)

From: Hellmuth Vargas <hivs77(at)gmail(dot)com>
To: Laurenz Albe <laurenz(dot)albe(at)cybertec(dot)at>
Cc: Matt Zagrabelny <mzagrabe(at)d(dot)umn(dot)edu>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Rob Sargent <robjsargent(at)gmail(dot)com>, "pgsql-generallists(dot)postgresql(dot)org" <pgsql-general(at)lists(dot)postgresql(dot)org>
Subject: Re: count(*) vs count(id)
Date: 2021-02-02 20:45:21
Message-ID: CAN3Qy4q1VXQHbUsVH7+yKumdfyfcxQAaHTGS5XKgjZ-5PMOpRA@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-general

Hello list

My English is not very good, so I pretend that through the examples you
understand what I intend to expose

-- Recreate the query that is supposedly wrong

select calendar.entry, count(*)
from generate_series('2020-08-20'::date, '2020-09-15'::date, '1
day'::interval) as calendar(entry)
left join (values (1,'2020-08-28 09:44:11'::timestamp),
(2,'2020-08-28 10:44:11'::timestamp),
(3,'2020-08-29 11:44:11'::timestamp),
(4,'2020-09-01 02:44:11'::timestamp),
(5,'2020-09-02 03:44:11'::timestamp),
(6,'2020-09-02 04:44:11'::timestamp),
(7,'2020-09-03 05:44:11'::timestamp),
(8,'2020-09-04 06:44:11'::timestamp),
(10,'2020-09-04 07:44:11'::timestamp)) as call_records(id,timestamp)
on calendar.entry = call_records.timestamp::date
group by calendar.entry;

-- wrong???

entry | count
------------------------+-------
2020-08-20 00:00:00-05 | 1
2020-08-21 00:00:00-05 | 1
2020-08-22 00:00:00-05 | 1
2020-08-23 00:00:00-05 | 1
2020-08-24 00:00:00-05 | 1
2020-08-25 00:00:00-05 | 1
2020-08-26 00:00:00-05 | 1
2020-08-27 00:00:00-05 | 1
2020-08-28 00:00:00-05 | 2
2020-08-29 00:00:00-05 | 1
2020-08-30 00:00:00-05 | 1
2020-08-31 00:00:00-05 | 1
2020-09-01 00:00:00-05 | 1
2020-09-02 00:00:00-05 | 2
2020-09-03 00:00:00-05 | 1
2020-09-04 00:00:00-05 | 2
2020-09-05 00:00:00-05 | 1
2020-09-06 00:00:00-05 | 1
2020-09-07 00:00:00-05 | 1
2020-09-08 00:00:00-05 | 1
2020-09-09 00:00:00-05 | 1
2020-09-10 00:00:00-05 | 1
2020-09-11 00:00:00-05 | 1
2020-09-12 00:00:00-05 | 1
2020-09-13 00:00:00-05 | 1
2020-09-14 00:00:00-05 | 1
2020-09-15 00:00:00-05 | 1

-- In the count I will only consider the records of call_records

select calendar.entry, count(call_records.*)
from generate_series('2020-08-20'::date, '2020-09-15'::date, '1
day'::interval) as calendar(entry)
left join (values (1,'2020-08-28 09:44:11'::timestamp),
(2,'2020-08-28 10:44:11'::timestamp),
(3,'2020-08-29 11:44:11'::timestamp),
(4,'2020-09-01 02:44:11'::timestamp),
(5,'2020-09-02 03:44:11'::timestamp),
(6,'2020-09-02 04:44:11'::timestamp),
(7,'2020-09-03 05:44:11'::timestamp),
(8,'2020-09-04 06:44:11'::timestamp),
(10,'2020-09-04 07:44:11'::timestamp)) as call_records(id,timestamp)
on calendar.entry = call_records.timestamp::date
group by calendar.entry;

--- perfect
entry | count
------------------------+-------
2020-08-20 00:00:00-05 | 0
2020-08-21 00:00:00-05 | 0
2020-08-22 00:00:00-05 | 0
2020-08-23 00:00:00-05 | 0
2020-08-24 00:00:00-05 | 0
2020-08-25 00:00:00-05 | 0
2020-08-26 00:00:00-05 | 0
2020-08-27 00:00:00-05 | 0
2020-08-28 00:00:00-05 | 2
2020-08-29 00:00:00-05 | 1
2020-08-30 00:00:00-05 | 0
2020-08-31 00:00:00-05 | 0
2020-09-01 00:00:00-05 | 1
2020-09-02 00:00:00-05 | 2
2020-09-03 00:00:00-05 | 1
2020-09-04 00:00:00-05 | 2
2020-09-05 00:00:00-05 | 0
2020-09-06 00:00:00-05 | 0
2020-09-07 00:00:00-05 | 0
2020-09-08 00:00:00-05 | 0
2020-09-09 00:00:00-05 | 0
2020-09-10 00:00:00-05 | 0
2020-09-11 00:00:00-05 | 0
2020-09-12 00:00:00-05 | 0
2020-09-13 00:00:00-05 | 0
2020-09-14 00:00:00-05 | 0
2020-09-15 00:00:00-05 | 0

when placing * I want to bring all the join records between both tables and
when counting them of course there will be a row for the dates 2020-08-30
, 2020-08-31 so the call_records fields are null

select *
from generate_series('2020-08-20'::date, '2020-09-15'::date, '1
day'::interval) as calendar(entry)
left join (values (1,'2020-08-28 09:44:11'::timestamp),
(2,'2020-08-28 10:44:11'::timestamp),
(3,'2020-08-29 11:44:11'::timestamp),
(4,'2020-09-01 02:44:11'::timestamp),
(5,'2020-09-02 03:44:11'::timestamp),
(6,'2020-09-02 04:44:11'::timestamp),
(7,'2020-09-03 05:44:11'::timestamp),
(8,'2020-09-04 06:44:11'::timestamp),
(10,'2020-09-04 07:44:11'::timestamp)) as call_records(id,timestamp)
on calendar.entry = call_records.timestamp::date

entry | id | timestamp
------------------------+----+---------------------
2020-08-20 00:00:00-05 | |
2020-08-21 00:00:00-05 | |
2020-08-22 00:00:00-05 | |
2020-08-23 00:00:00-05 | |
2020-08-24 00:00:00-05 | |
2020-08-25 00:00:00-05 | |
2020-08-26 00:00:00-05 | |
2020-08-27 00:00:00-05 | |
2020-08-28 00:00:00-05 | 1 | 2020-08-28 09:44:11
2020-08-28 00:00:00-05 | 2 | 2020-08-28 10:44:11
2020-08-29 00:00:00-05 | 3 | 2020-08-29 11:44:11
2020-08-30 00:00:00-05 | |
2020-08-31 00:00:00-05 | |
2020-09-01 00:00:00-05 | 4 | 2020-09-01 02:44:11
2020-09-02 00:00:00-05 | 5 | 2020-09-02 03:44:11
2020-09-02 00:00:00-05 | 6 | 2020-09-02 04:44:11
2020-09-03 00:00:00-05 | 7 | 2020-09-03 05:44:11
2020-09-04 00:00:00-05 | 8 | 2020-09-04 06:44:11
2020-09-04 00:00:00-05 | 10 | 2020-09-04 07:44:11
2020-09-05 00:00:00-05 | |
2020-09-06 00:00:00-05 | |
2020-09-07 00:00:00-05 | |
2020-09-08 00:00:00-05 | |
2020-09-09 00:00:00-05 | |
2020-09-10 00:00:00-05 | |
2020-09-11 00:00:00-05 | |
2020-09-12 00:00:00-05 | |
2020-09-13 00:00:00-05 | |
2020-09-14 00:00:00-05 | |
2020-09-15 00:00:00-05 | |

--- counting

select entry, count(*)
from (
select *
from generate_series('2020-08-20'::date, '2020-09-15'::date, '1
day'::interval) as calendar(entry)
left join (values (1,'2020-08-28 09:44:11'::timestamp),
(2,'2020-08-28 10:44:11'::timestamp),
(3,'2020-08-29 11:44:11'::timestamp),
(4,'2020-09-01 02:44:11'::timestamp),
(5,'2020-09-02 03:44:11'::timestamp),
(6,'2020-09-02 04:44:11'::timestamp),
(7,'2020-09-03 05:44:11'::timestamp),
(8,'2020-09-04 06:44:11'::timestamp),
(10,'2020-09-04 07:44:11'::timestamp)) as call_records(id,timestamp)
on calendar.entry = call_records.timestamp::date
) as u
group by entry

entry | count
------------------------+-------
2020-08-20 00:00:00-05 | 1
2020-08-21 00:00:00-05 | 1
2020-08-22 00:00:00-05 | 1
2020-08-23 00:00:00-05 | 1
2020-08-24 00:00:00-05 | 1
2020-08-25 00:00:00-05 | 1
2020-08-26 00:00:00-05 | 1
2020-08-27 00:00:00-05 | 1
2020-08-28 00:00:00-05 | 2
2020-08-29 00:00:00-05 | 1
2020-08-30 00:00:00-05 | 1
2020-08-31 00:00:00-05 | 1
2020-09-01 00:00:00-05 | 1
2020-09-02 00:00:00-05 | 2
2020-09-03 00:00:00-05 | 1
2020-09-04 00:00:00-05 | 2
2020-09-05 00:00:00-05 | 1
2020-09-06 00:00:00-05 | 1
2020-09-07 00:00:00-05 | 1
2020-09-08 00:00:00-05 | 1
2020-09-09 00:00:00-05 | 1
2020-09-10 00:00:00-05 | 1
2020-09-11 00:00:00-05 | 1
2020-09-12 00:00:00-05 | 1
2020-09-13 00:00:00-05 | 1
2020-09-14 00:00:00-05 | 1
2020-09-15 00:00:00-05 | 1

El mar, 2 de feb. de 2021 a la(s) 03:31, Laurenz Albe (
laurenz(dot)albe(at)cybertec(dot)at) escribió:

> On Mon, 2021-02-01 at 19:14 -0600, Matt Zagrabelny wrote:
> > > > What is count(*) counting then? I thought it was rows.
> > >
> > > Yeah, but count(id) only counts rows where id isn't null.
> >
> > I guess I'm still not understanding it...
> >
> > I don't have any rows where id is null:
>
> Then the *result* of count(*) and count(id) will be the same.
>
> The asterisk in count(*) is misleading. Different from any other
> programming language that I know, the SQL standard has decided that
> you cannot have an aggregate function without arguments. You have
> to use the asterisk in that case.
>
> So count(*) really is count(), that is, it counts one for every
> row that it finds, no matter what the row contains.
>
> But count(id) includes a check: if "id IS NULL", it is not counted.
>
> If that condition is satisfied for all "id"s, you end up with
> the same count. But count(id) is more expensive, because it
> will perform this unnecessary NULLness check for each row.
>
> In short: use count(*) if you want to count rows, and use
> count(x) if you want to count all rows where x IS NOT NULL.
>
> Yours,
> Laurenz Albe
> --
> Cybertec | https://www.cybertec-postgresql.com
>
>
>
>

--
Cordialmente,

Ing. Hellmuth I. Vargas S.

In response to

Responses

Browse pgsql-general by date

  From Date Subject
Next Message Joao Miguel Ferreira 2021-02-02 21:58:15 permission denied to create and drop user
Previous Message Tim Cross 2021-02-02 19:14:01 Re: ransomware