RE: [EXT] Re: Improve "select count(*)" query - takes more than 30 mins for some large tables

From: MichaelDBA Vitale <michaeldba(at)sqlexec(dot)com>
To: "Pierson Patricia L (Contractor)" <Patricia(dot)L(dot)Pierson(at)irs(dot)gov>, "gogala(dot)mladen(at)gmail(dot)com" <gogala(dot)mladen(at)gmail(dot)com>
Cc: "pgsql-admin(at)lists(dot)postgresql(dot)org" <pgsql-admin(at)lists(dot)postgresql(dot)org>
Subject: RE: [EXT] Re: Improve "select count(*)" query - takes more than 30 mins for some large tables
Date: 2022-07-12 18:25:45
Message-ID: 1037158693.49441.1657650345834@email.ionos.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-admin

<!doctype html>
<html>
<head>
<meta charset="UTF-8">
</head>
<body>
<div>
That is not true: doing the select on the primary key will still result in a table scan, not an index scan.&nbsp; The heap always gets accessed for select counts.
</div>
<div class="default-style">
<br>
</div>
<div class="default-style">
Regards,
</div>
<div class="default-style">
Michael Vitale
</div>
<blockquote type="cite">
<div>
On 07/12/2022 2:13 PM Pierson Patricia L (Contractor) &lt;patricia(dot)l(dot)pierson(at)irs(dot)gov&gt; wrote:
</div>
<div>
<br>
</div>
<div>
<br>
</div>
<div class="WordSection1">
<p class="MsoNormal"><span style="font-size: 12.0pt; font-family: 'Verdana',sans-serif;">Hello,</span></p>
<p class="MsoNormal"><span style="font-size: 12.0pt; font-family: 'Verdana',sans-serif;">&nbsp;</span></p>
<p class="MsoNormal"><span style="font-size: 12.0pt; font-family: 'Verdana',sans-serif;">Do a count on the primary key.&nbsp; Will force index access and you don’t access the entire row which may be very long.</span></p>
<p class="MsoNormal"><span style="font-size: 12.0pt; font-family: 'Verdana',sans-serif;">LIKE : select count(ID) from my_table;</span></p>
<p class="MsoNormal"><span style="font-size: 12.0pt; font-family: 'Verdana',sans-serif;">&nbsp;</span></p>
<div style="border: none; border-top: solid #E1E1E1 1.0pt; padding: 3.0pt 0in 0in 0in;">
<p class="MsoNormal"><strong>From:</strong> Mladen Gogala &lt;gogala(dot)mladen(at)gmail(dot)com&gt; <br><strong>Sent:</strong> Tuesday, July 12, 2022 11:58 AM<br><strong>To:</strong> MichaelDBA Vitale &lt;michaeldba(at)sqlexec(dot)com&gt;<br><strong>Cc:</strong> pgsql-admin(at)lists(dot)postgresql(dot)org<br><strong>Subject:</strong> [EXT] Re: Improve "select count(*)" query - takes more than 30 mins for some large tables</p>
</div>
<p class="MsoNormal"><br></p>
<div>
<p class="MsoNormal">What's wrong with parallelism? That's why it was invented. If you really need an accurate count at moment's notice, create a trigger to maintain it.</p>
<div>
<p class="MsoNormal">Regards</p>
</div>
</div>
<p class="MsoNormal"><br></p>
<div>
<div>
<p class="MsoNormal">On Tue, Jul 12, 2022, 10:31 AM MichaelDBA Vitale &lt;<a href="mailto:michaeldba(at)sqlexec(dot)com">michaeldba(at)sqlexec(dot)com</a>&gt; wrote:</p>
</div>
<blockquote>
<div>
<div>
<p class="MsoNormal">Perhaps do an analyze on the table and then select reltuples from pg_class for that table.&nbsp; Might be faster than the select count(*).</p>
</div>
<div>
<p class="MsoNormal"><br></p>
</div>
<div>
<p class="MsoNormal">Regards,</p>
</div>
<div>
<p class="MsoNormal">Michael Vitale</p>
</div>
<div>
<p class="MsoNormal"><br></p>
</div>
<blockquote>
<div>
<p class="MsoNormal">On 07/12/2022 8:51 AM Mladen Gogala &lt;<a target="_blank" href="mailto:gogala(dot)mladen(at)gmail(dot)com" rel="noopener">gogala(dot)mladen(at)gmail(dot)com</a>&gt; wrote:</p>
</div>
<div>
<p class="MsoNormal"><br></p>
</div>
<div>
<p class="MsoNormal"><br></p>
</div>
<div>
<p class="MsoNormal">On 7/11/22 03:23, Florents Tselai wrote:</p>
</div>
<blockquote>
<pre>psql “select id from my_table" | sort -u | wc -l&nbsp;&nbsp; </pre>
</blockquote>
<p>That will be a lot slower than just "select count(*) from my_table".&nbsp; You are delivering data to the user program (psql) and then shipping them to pipe and then processing the output with "wc". Depending on the version, PostgreSQL has very reliable parallelism and can do counting rather quickly. The speed of "select count(*) from my_table" depends on the speed of I/O. Since the table is big, it cannot be cached in the file system cache, so all that you have at your disposal is the raw disk speed. For the smaller machines, NVME is the king. For larger rigs, you should consider something like Pure, XTremIO or NetApp SolidFire. People frequently expect database to do miracles with under par hardware.</p>
<pre>-- </pre>
<pre>Mladen Gogala</pre>
<pre>Database Consultant</pre>
<pre>Tel: (347) 321-1217</pre>
<pre><a target="_blank" href="https://dbwhisperer.wordpress.com" rel="noopener">https://dbwhisperer.wordpress.com</a></pre>
</blockquote>
</div>
</blockquote>
</div>
</div>
</blockquote>
</body>
</html>

Attachment Content-Type Size
unknown_filename text/html 4.5 KB

In response to

Responses

Browse pgsql-admin by date

  From Date Subject
Next Message Laurenz Albe 2022-07-12 19:13:27 Re: [EXT] Re: Improve "select count(*)" query - takes more than 30 mins for some large tables
Previous Message Holger Jakobs 2022-07-12 18:25:32 Re: [EXT] Re: Improve "select count(*)" query - takes more than 30 mins for some large tables