RE: [EXT] Re: Improve "select count(*)" query - takes more than 30 mins for some large tables

From: "Pierson Patricia L (Contractor)" <Patricia(dot)L(dot)Pierson(at)irs(dot)gov>
To: "gogala(dot)mladen(at)gmail(dot)com" <gogala(dot)mladen(at)gmail(dot)com>, "michaeldba(at)sqlexec(dot)com" <michaeldba(at)sqlexec(dot)com>
Cc: "pgsql-admin(at)lists(dot)postgresql(dot)org" <pgsql-admin(at)lists(dot)postgresql(dot)org>
Subject: RE: [EXT] Re: Improve "select count(*)" query - takes more than 30 mins for some large tables
Date: 2022-07-12 18:13:08
Message-ID: SJ0PR09MB6061D637A2A7988D19DC506FC7869@SJ0PR09MB6061.namprd09.prod.outlook.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-admin

Hello,

Do a count on the primary key. Will force index access and you don’t access
the entire row which may be very long.

LIKE : select count(ID) from my_table;

From: Mladen Gogala <gogala(dot)mladen(at)gmail(dot)com>
Sent: Tuesday, July 12, 2022 11:58 AM
To: MichaelDBA Vitale <michaeldba(at)sqlexec(dot)com>
Cc: pgsql-admin(at)lists(dot)postgresql(dot)org
Subject: [EXT] Re: Improve "select count(*)" query - takes more than 30 mins
for some large tables

What's wrong with parallelism? That's why it was invented. If you really need
an accurate count at moment's notice, create a trigger to maintain it.

Regards

On Tue, Jul 12, 2022, 10:31 AM MichaelDBA Vitale <michaeldba(at)sqlexec(dot)com
<mailto:michaeldba(at)sqlexec(dot)com> > wrote:

Perhaps do an analyze on the table and then select reltuples from pg_class for
that table. Might be faster than the select count(*).

Regards,

Michael Vitale

On 07/12/2022 8:51 AM Mladen Gogala <gogala(dot)mladen(at)gmail(dot)com
<mailto:gogala(dot)mladen(at)gmail(dot)com> > wrote:

On 7/11/22 03:23, Florents Tselai wrote:

psql “select id from my_table" | sort -u | wc -l

That will be a lot slower than just "select count(*) from my_table". You are
delivering data to the user program (psql) and then shipping them to pipe and
then processing the output with "wc". Depending on the version, PostgreSQL has
very reliable parallelism and can do counting rather quickly. The speed of
"select count(*) from my_table" depends on the speed of I/O. Since the table
is big, it cannot be cached in the file system cache, so all that you have at
your disposal is the raw disk speed. For the smaller machines, NVME is the
king. For larger rigs, you should consider something like Pure, XTremIO or
NetApp SolidFire. People frequently expect database to do miracles with under
par hardware.

--
Mladen Gogala
Database Consultant
Tel: (347) 321-1217
https://dbwhisperer.wordpress.com

In response to

Responses

Browse pgsql-admin by date

  From Date Subject
Next Message Holger Jakobs 2022-07-12 18:25:32 Re: [EXT] Re: Improve "select count(*)" query - takes more than 30 mins for some large tables
Previous Message Mladen Gogala 2022-07-12 17:57:42 Re: Improve "select count(*)" query - takes more than 30 mins for some large tables