From: | "asdf asdasfa" <sjg(at)email(dot)com> |
---|---|
To: | pgsql-general(at)postgresql(dot)org |
Subject: | Performance Issues with count() |
Date: | 2002-04-23 22:06:54 |
Message-ID: | 20020423220654.6998.qmail@email.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-general |
<DIV><FONT face=Arial size=2>
<P class=MsoNormal style="MARGIN: 0in 0in 0pt"><FONT face="Times New Roman" size=3>Hi,</FONT></P>
<P class=MsoNormal style="MARGIN: 0in 0in 0pt"><?xml:namespace prefix = o ns = "urn:schemas-microsoft-com:office:office" /><o:p><FONT face="Times New Roman" size=3> </FONT></o:p></P>
<P class=MsoNormal style="MARGIN: 0in 0in 0pt"><FONT face="Times New Roman" size=3>Ive found some performance issues with Postgres that Im hoping people on this list can help resolve.<SPAN style="mso-spacerun: yes"> </SPAN>Were working with a 65 million record table that includes year of birth (data type INT). To count the frequency of dates in the table, it takes 2 hours 26 minutes to execute. (Theres an approximately 100-year range of dates in the 65 million records).</FONT></P>
<P class=MsoNormal style="MARGIN: 0in 0in 0pt"><o:p><FONT face="Times New Roman" size=3> </FONT></o:p></P>
<P class=MsoNormal style="MARGIN: 0in 0in 0pt"><B style="mso-bidi-font-weight: normal"><SPAN style="FONT-SIZE: 9pt; FONT-FAMILY: 'Courier New'"># EXPLAIN SELECT yb, count(1) FROM data_table GROUP BY yb;<o:p></o:p></SPAN></B></P>
<P class=MsoNormal style="MARGIN: 0in 0in 0pt"><B style="mso-bidi-font-weight: normal"><SPAN style="FONT-SIZE: 9pt; FONT-FAMILY: 'Courier New'">NOTICE:<SPAN style="mso-spacerun: yes"> </SPAN>QUERY PLAN:<o:p></o:p></SPAN></B></P>
<P class=MsoNormal style="MARGIN: 0in 0in 0pt"><B style="mso-bidi-font-weight: normal"><SPAN style="FONT-SIZE: 9pt; FONT-FAMILY: 'Courier New'">Aggregate (cost=16397434.27..16723548.69 rows=6522288 width=4)<o:p></o:p></SPAN></B></P>
<P class=MsoNormal style="MARGIN: 0in 0in 0pt"><B style="mso-bidi-font-weight: normal"><SPAN style="FONT-SIZE: 9pt; FONT-FAMILY: 'Courier New'"><SPAN style="mso-spacerun: yes"> </SPAN>-> Group (cost=16397434.27..16560491.48 rows=65222884 width=4)<o:p></o:p></SPAN></B></P>
<P class=MsoNormal style="MARGIN: 0in 0in 0pt"><B style="mso-bidi-font-weight: normal"><SPAN style="FONT-SIZE: 9pt; FONT-FAMILY: 'Courier New'"><SPAN style="mso-spacerun: yes"> </SPAN>-> Sort (cost=16397434.27..16397434.27 rows=65222884 width=4)<o:p></o:p></SPAN></B></P>
<P class=MsoNormal style="MARGIN: 0in 0in 0pt"><B style="mso-bidi-font-weight: normal"><SPAN style="FONT-SIZE: 9pt; FONT-FAMILY: 'Courier New'"><SPAN style="mso-spacerun: yes"> </SPAN>-> Seq Scan on data_table (cost=0.00..2368620.84 rows=65222884 width=4)<o:p></o:p></SPAN></B></P>
<P class=MsoNormal style="MARGIN: 0in 0in 0pt"><B style="mso-bidi-font-weight: normal"><SPAN style="FONT-SIZE: 9pt; FONT-FAMILY: 'Courier New'"><o:p> </o:p></SPAN></B></P>
<P class=MsoNormal style="MARGIN: 0in 0in 0pt"><FONT face="Times New Roman" size=3>I can count data from the flat text data file with this Perl script:</FONT></P>
<P class=MsoNormal style="MARGIN: 0in 0in 0pt"><o:p><FONT face="Times New Roman" size=3> </FONT></o:p></P>
<P class=MsoNormal style="MARGIN: 0in 0in 0pt"><B style="mso-bidi-font-weight: normal"><SPAN style="FONT-SIZE: 9pt; FONT-FAMILY: 'Courier New'"><SPAN style="mso-spacerun: yes"> </SPAN>#!/usr/bin/perl<o:p></o:p></SPAN></B></P>
<P class=MsoNormal style="MARGIN: 0in 0in 0pt"><B style="mso-bidi-font-weight: normal"><SPAN style="FONT-SIZE: 9pt; FONT-FAMILY: 'Courier New'"><SPAN style="mso-spacerun: yes"> </SPAN># script to count YB frequencies in flat data file<o:p></o:p></SPAN></B></P>
<P class=MsoNormal style="MARGIN: 0in 0in 0pt"><B style="mso-bidi-font-weight: normal"><SPAN style="FONT-SIZE: 9pt; FONT-FAMILY: 'Courier New'"><SPAN style="mso-spacerun: yes"> </SPAN>open (IN, "$ARGV[0]");<o:p></o:p></SPAN></B></P>
<P class=MsoNormal style="MARGIN: 0in 0in 0pt"><B style="mso-bidi-font-weight: normal"><SPAN style="FONT-SIZE: 9pt; FONT-FAMILY: 'Courier New'"><SPAN style="mso-spacerun: yes"> </SPAN>open (OUT, ">$ARGV[0]\_cnt");<o:p></o:p></SPAN></B></P>
<P class=MsoNormal style="MARGIN: 0in 0in 0pt"><B style="mso-bidi-font-weight: normal"><SPAN style="FONT-SIZE: 9pt; FONT-FAMILY: 'Courier New'"><SPAN style="mso-spacerun: yes"> </SPAN>while (<IN>) {<o:p></o:p></SPAN></B></P>
<P class=MsoNormal style="MARGIN: 0in 0in 0pt"><B style="mso-bidi-font-weight: normal"><SPAN style="FONT-SIZE: 9pt; FONT-FAMILY: 'Courier New'"><SPAN style="mso-spacerun: yes"> </SPAN><SPAN style="mso-spacerun: yes"> </SPAN>chomp;<o:p></o:p></SPAN></B></P>
<P class=MsoNormal style="MARGIN: 0in 0in 0pt"><B style="mso-bidi-font-weight: normal"><SPAN style="FONT-SIZE: 9pt; FONT-FAMILY: 'Courier New'"><SPAN style="mso-spacerun: yes"> </SPAN><SPAN style="mso-spacerun: yes"> </SPAN>$years{$_}++;<o:p></o:p></SPAN></B></P>
<P class=MsoNormal style="MARGIN: 0in 0in 0pt"><B style="mso-bidi-font-weight: normal"><SPAN style="FONT-SIZE: 9pt; FONT-FAMILY: 'Courier New'"><SPAN style="mso-spacerun: yes"> </SPAN>}<o:p></o:p></SPAN></B></P>
<P class=MsoNormal style="MARGIN: 0in 0in 0pt"><B style="mso-bidi-font-weight: normal"><SPAN style="FONT-SIZE: 9pt; FONT-FAMILY: 'Courier New'"><SPAN style="mso-spacerun: yes"> </SPAN>foreach $key (keys %years) {<o:p></o:p></SPAN></B></P>
<P class=MsoNormal style="MARGIN: 0in 0in 0pt"><B style="mso-bidi-font-weight: normal"><SPAN style="FONT-SIZE: 9pt; FONT-FAMILY: 'Courier New'"><SPAN style="mso-spacerun: yes"> </SPAN><SPAN style="mso-spacerun: yes"> </SPAN>print OUT "$key,$years{$key}\n";<o:p></o:p></SPAN></B></P>
<P class=MsoNormal style="MARGIN: 0in 0in 0pt"><B style="mso-bidi-font-weight: normal"><SPAN style="FONT-SIZE: 9pt; FONT-FAMILY: 'Courier New'"><SPAN style="mso-spacerun: yes"> </SPAN>}<o:p></o:p></SPAN></B></P>
<P class=MsoNormal style="MARGIN: 0in 0in 0pt"><o:p><FONT face="Times New Roman" size=3> </FONT></o:p></P>
<P class=MsoNormal style="MARGIN: 0in 0in 0pt"><FONT face="Times New Roman" size=3>The Perl script takes *1 <B style="mso-bidi-font-weight: normal"><I style="mso-bidi-font-style: normal">minute</I></B>*, 31 seconds to run.<SPAN style="mso-spacerun: yes"> </SPAN>Why is there such a discrepancy in times?<SPAN style="mso-spacerun: yes"> </SPAN>Ive noticed that the Postgres </FONT><B style="mso-bidi-font-weight: normal"><SPAN style="FONT-SIZE: 9pt; FONT-FAMILY: 'Courier New'">count()</SPAN></B><FONT face="Times New Roman" size=3> function takes what seems to be longer than it should in other cases as well. <SPAN style="mso-spacerun: yes"> </SPAN>For instance, counting the frequency of last names in the same 65 million record table took *1 hour* and 31 minutes:</FONT></P>
<P class=MsoNormal style="MARGIN: 0in 0in 0pt"><o:p><FONT face="Times New Roman" size=3> </FONT></o:p></P>
<P class=MsoNormal style="MARGIN: 0in 0in 0pt"><B style="mso-bidi-font-weight: normal"><SPAN style="FONT-SIZE: 9pt; FONT-FAMILY: 'Courier New'"># EXPLAIN SELECT ln, count(1) FROM data_table GROUP BY ln;<o:p></o:p></SPAN></B></P>
<P class=MsoNormal style="MARGIN: 0in 0in 0pt"><B style="mso-bidi-font-weight: normal"><SPAN style="FONT-SIZE: 9pt; FONT-FAMILY: 'Courier New'">NOTICE:<SPAN style="mso-spacerun: yes"> </SPAN>QUERY PLAN:<o:p></o:p></SPAN></B></P>
<P class=MsoNormal style="MARGIN: 0in 0in 0pt"><B style="mso-bidi-font-weight: normal"><SPAN style="FONT-SIZE: 9pt; FONT-FAMILY: 'Courier New'">Aggregate (cost=19538149.27..19864263.69 rows=6522288 width=19)<o:p></o:p></SPAN></B></P>
<P class=MsoNormal style="MARGIN: 0in 0in 0pt"><B style="mso-bidi-font-weight: normal"><SPAN style="FONT-SIZE: 9pt; FONT-FAMILY: 'Courier New'"><SPAN style="mso-spacerun: yes"> </SPAN>-><SPAN style="mso-spacerun: yes"> </SPAN>Group (cost=19538149.27..19701206.48 rows=65222884 width=19)<o:p></o:p></SPAN></B></P>
<P class=MsoNormal style="MARGIN: 0in 0in 0pt"><B style="mso-bidi-font-weight: normal"><SPAN style="FONT-SIZE: 9pt; FONT-FAMILY: 'Courier New'"><SPAN style="mso-spacerun: yes"> </SPAN>-><SPAN style="mso-spacerun: yes"> </SPAN>Sort (cost=19538149.27..19538149.27 rows=65222884 width=19)<o:p></o:p></SPAN></B></P>
<P class=MsoNormal style="MARGIN: 0in 0in 0pt"><B style="mso-bidi-font-weight: normal"><SPAN style="FONT-SIZE: 9pt; FONT-FAMILY: 'Courier New'"><SPAN style="mso-spacerun: yes"> </SPAN><SPAN style="mso-spacerun: yes"> </SPAN>-><SPAN style="mso-spacerun: yes"> </SPAN>Seq Scan on data_table (cost=0.00..2324610.84 rows=65222884 width=19)<o:p></o:p></SPAN></B></P>
<P class=MsoNormal style="MARGIN: 0in 0in 0pt"><o:p><FONT face="Times New Roman" size=3> </FONT></o:p></P>
<P class=MsoNormal style="MARGIN: 0in 0in 0pt"><FONT face="Times New Roman" size=3>The last name (</FONT><B style="mso-bidi-font-weight: normal"><SPAN style="FONT-SIZE: 9pt; FONT-FAMILY: 'Courier New'">ln</SPAN></B><FONT face="Times New Roman" size=3>) and the year of birth (</FONT><B style="mso-bidi-font-weight: normal"><SPAN style="FONT-SIZE: 9pt; FONT-FAMILY: 'Courier New'">yb</SPAN></B><FONT face="Times New Roman" size=3>) is indexed, but that shouldnt matter because its doing a sequential scan, correct?<SPAN style="mso-spacerun: yes"> </SPAN>Am I running into the limitations of Postgres?<SPAN style="mso-spacerun: yes"> </SPAN>Wed like to eventually get this system into production, but if we cant get Postgres to </FONT><B style="mso-bidi-font-weight: normal"><SPAN style="FONT-SIZE: 9pt; FONT-FAMILY: 'Courier New'">count()</SPAN></B><FONT face="Times New Roman" size=3> faster, we may not be able to use it.</FONT></P>
<P class=MsoNormal style="MARGIN: 0in 0in 0pt"><o:p><FONT face="Times New Roman" size=3> </FONT></o:p></P>
<P class=MsoNormal style="MARGIN: 0in 0in 0pt"><FONT face="Times New Roman" size=3>Heres the data_table schema:</FONT></P>
<P class=MsoNormal style="MARGIN: 0in 0in 0pt"><o:p><FONT face="Times New Roman" size=3> </FONT></o:p></P>
<P class=MsoNormal style="MARGIN: 0in 0in 0pt"><B style="mso-bidi-font-weight: normal"><SPAN style="FONT-SIZE: 9pt; FONT-FAMILY: 'Courier New'"># \d data_table<o:p></o:p></SPAN></B></P>
<P class=MsoNormal style="MARGIN: 0in 0in 0pt"><B style="mso-bidi-font-weight: normal"><SPAN style="FONT-SIZE: 9pt; FONT-FAMILY: 'Courier New'"><SPAN style="mso-spacerun: yes"> </SPAN><SPAN style="mso-spacerun: yes"> </SPAN>Table "data_table"<o:p></o:p></SPAN></B></P>
<P class=MsoNormal style="MARGIN: 0in 0in 0pt"><B style="mso-bidi-font-weight: normal"><SPAN style="FONT-SIZE: 9pt; FONT-FAMILY: 'Courier New'"><SPAN style="mso-spacerun: yes"> </SPAN>Column |<SPAN style="mso-spacerun: yes"> </SPAN>Type<SPAN style="mso-spacerun: yes"> </SPAN>| Modifiers <o:p></o:p></SPAN></B></P>
<P class=MsoNormal style="MARGIN: 0in 0in 0pt"><B style="mso-bidi-font-weight: normal"><SPAN style="FONT-SIZE: 9pt; FONT-FAMILY: 'Courier New'">--------+---------------+-----------<o:p></o:p></SPAN></B></P>
<P class=MsoNormal style="MARGIN: 0in 0in 0pt"><B style="mso-bidi-font-weight: normal"><SPAN style="FONT-SIZE: 9pt; FONT-FAMILY: 'Courier New'"><SPAN style="mso-spacerun: yes"> </SPAN>ss<SPAN style="mso-spacerun: yes"> </SPAN>| character(9)<SPAN style="mso-spacerun: yes"> </SPAN>| <o:p></o:p></SPAN></B></P>
<P class=MsoNormal style="MARGIN: 0in 0in 0pt"><B style="mso-bidi-font-weight: normal"><SPAN style="FONT-SIZE: 9pt; FONT-FAMILY: 'Courier New'"><SPAN style="mso-spacerun: yes"> </SPAN>ln<SPAN style="mso-spacerun: yes"> </SPAN>| character(15) | <o:p></o:p></SPAN></B></P>
<P class=MsoNormal style="MARGIN: 0in 0in 0pt"><B style="mso-bidi-font-weight: normal"><SPAN style="FONT-SIZE: 9pt; FONT-FAMILY: 'Courier New'"><SPAN style="mso-spacerun: yes"> </SPAN>fn<SPAN style="mso-spacerun: yes"> </SPAN>| character(15) | <o:p></o:p></SPAN></B></P>
<P class=MsoNormal style="MARGIN: 0in 0in 0pt"><B style="mso-bidi-font-weight: normal"><SPAN style="FONT-SIZE: 9pt; FONT-FAMILY: 'Courier New'"><SPAN style="mso-spacerun: yes"> </SPAN>mi<SPAN style="mso-spacerun: yes"> </SPAN>| character(1)<SPAN style="mso-spacerun: yes"> </SPAN>| <o:p></o:p></SPAN></B></P>
<P class=MsoNormal style="MARGIN: 0in 0in 0pt"><B style="mso-bidi-font-weight: normal"><SPAN style="FONT-SIZE: 9pt; FONT-FAMILY: 'Courier New'"><SPAN style="mso-spacerun: yes"> </SPAN>ns<SPAN style="mso-spacerun: yes"> </SPAN>| character(15) | <o:p></o:p></SPAN></B></P>
<P class=MsoNormal style="MARGIN: 0in 0in 0pt"><B style="mso-bidi-font-weight: normal"><SPAN style="FONT-SIZE: 9pt; FONT-FAMILY: 'Courier New'"><SPAN style="mso-spacerun: yes"> </SPAN>lny<SPAN style="mso-spacerun: yes"> </SPAN>| character(15) | <o:p></o:p></SPAN></B></P>
<P class=MsoNormal style="MARGIN: 0in 0in 0pt"><B style="mso-bidi-font-weight: normal"><SPAN style="FONT-SIZE: 9pt; FONT-FAMILY: 'Courier New'"><SPAN style="mso-spacerun: yes"> </SPAN>fny<SPAN style="mso-spacerun: yes"> </SPAN>| character(15) | <o:p></o:p></SPAN></B></P>
<P class=MsoNormal style="MARGIN: 0in 0in 0pt"><B style="mso-bidi-font-weight: normal"><SPAN style="FONT-SIZE: 9pt; FONT-FAMILY: 'Courier New'"><SPAN style="mso-spacerun: yes"> </SPAN>sny<SPAN style="mso-spacerun: yes"> </SPAN>| character(15) | <o:p></o:p></SPAN></B></P>
<P class=MsoNormal style="MARGIN: 0in 0in 0pt"><B style="mso-bidi-font-weight: normal"><SPAN style="FONT-SIZE: 9pt; FONT-FAMILY: 'Courier New'"><SPAN style="mso-spacerun: yes"> </SPAN>g<SPAN style="mso-spacerun: yes"> </SPAN>| character(1)<SPAN style="mso-spacerun: yes"> </SPAN>| <o:p></o:p></SPAN></B></P>
<P class=MsoNormal style="MARGIN: 0in 0in 0pt"><B style="mso-bidi-font-weight: normal"><SPAN style="FONT-SIZE: 9pt; FONT-FAMILY: 'Courier New'"><SPAN style="mso-spacerun: yes"> </SPAN>mb<SPAN style="mso-spacerun: yes"> </SPAN>| integer<SPAN style="mso-spacerun: yes"> </SPAN>| <o:p></o:p></SPAN></B></P>
<P class=MsoNormal style="MARGIN: 0in 0in 0pt"><B style="mso-bidi-font-weight: normal"><SPAN style="FONT-SIZE: 9pt; FONT-FAMILY: 'Courier New'"><SPAN style="mso-spacerun: yes"> </SPAN>db<SPAN style="mso-spacerun: yes"> </SPAN>| integer<SPAN style="mso-spacerun: yes"> </SPAN>| <o:p></o:p></SPAN></B></P>
<P class=MsoNormal style="MARGIN: 0in 0in 0pt"><B style="mso-bidi-font-weight: normal"><SPAN style="FONT-SIZE: 9pt; FONT-FAMILY: 'Courier New'"><SPAN style="mso-spacerun: yes"> </SPAN>yb<SPAN style="mso-spacerun: yes"> </SPAN>| integer<SPAN style="mso-spacerun: yes"> </SPAN>| <o:p></o:p></SPAN></B></P>
<P class=MsoNormal style="MARGIN: 0in 0in 0pt"><B style="mso-bidi-font-weight: normal"><SPAN style="FONT-SIZE: 9pt; FONT-FAMILY: 'Courier New'"><SPAN style="mso-spacerun: yes"> </SPAN>md<SPAN style="mso-spacerun: yes"> </SPAN><SPAN style="mso-spacerun: yes"> </SPAN>| integer<SPAN style="mso-spacerun: yes"> </SPAN>| <o:p></o:p></SPAN></B></P>
<P class=MsoNormal style="MARGIN: 0in 0in 0pt"><B style="mso-bidi-font-weight: normal"><SPAN style="FONT-SIZE: 9pt; FONT-FAMILY: 'Courier New'"><SPAN style="mso-spacerun: yes"> </SPAN>dd<SPAN style="mso-spacerun: yes"> </SPAN>| integer<SPAN style="mso-spacerun: yes"> </SPAN>| <o:p></o:p></SPAN></B></P>
<P class=MsoNormal style="MARGIN: 0in 0in 0pt"><B style="mso-bidi-font-weight: normal"><SPAN style="FONT-SIZE: 9pt; FONT-FAMILY: 'Courier New'"><SPAN style="mso-spacerun: yes"> </SPAN>yd<SPAN style="mso-spacerun: yes"> </SPAN>| integer<SPAN style="mso-spacerun: yes"> </SPAN>| <o:p></o:p></SPAN></B></P>
<P class=MsoNormal style="MARGIN: 0in 0in 0pt"><B style="mso-bidi-font-weight: normal"><SPAN style="FONT-SIZE: 9pt; FONT-FAMILY: 'Courier New'">Indexes: ssdi_ss_idx<o:p></o:p></SPAN></B></P>
<P class=MsoNormal style="MARGIN: 0in 0in 0pt"><B style="mso-bidi-font-weight: normal"><SPAN style="FONT-SIZE: 9pt; FONT-FAMILY: 'Courier New'"><SPAN style="mso-spacerun: yes"> </SPAN>ssdi_ln_idx<o:p></o:p></SPAN></B></P>
<P class=MsoNormal style="MARGIN: 0in 0in 0pt"><o:p><FONT face="Times New Roman" size=3> </FONT></o:p></P>
<P class=MsoNormal style="MARGIN: 0in 0in 0pt"><o:p><FONT face="Times New Roman" size=3> </FONT></o:p></P>
<P class=MsoNormal style="MARGIN: 0in 0in 0pt"><FONT face="Times New Roman" size=3>Were working with Postgres v 7.2. The machine is a dual-processor Athlon MP1900 (Tyan Tiger board) with 3GB of PC2100 DDR RAM, and 3-80GB IBM 120GXP hard drives configured in a software RAID 0 Array running under RedHat Linux v. 7.2.</FONT></P>
<P class=MsoNormal style="MARGIN: 0in 0in 0pt"><o:p><FONT face="Times New Roman" size=3> </FONT></o:p></P>
<P class=MsoNormal style="MARGIN: 0in 0in 0pt"><FONT face="Times New Roman" size=3>Weve </FONT><B style="mso-bidi-font-weight: normal"><SPAN style="FONT-SIZE: 9pt; FONT-FAMILY: 'Courier New'">VACUUM ANALYZE</SPAN></B><FONT face="Times New Roman" size=3>d the tables after creating the indices.<SPAN style="mso-spacerun: yes"> </SPAN>Is there something Im missing here?</FONT></P>
<P class=MsoNormal style="MARGIN: 0in 0in 0pt"><o:p><FONT face="Times New Roman" size=3> </FONT></o:p></P>
<P class=MsoNormal style="MARGIN: 0in 0in 0pt"><FONT size=3><FONT face="Times New Roman">Thanks for you suggestions.</FONT></FONT></P>
<P class=MsoNormal style="MARGIN: 0in 0in 0pt"><FONT size=3><FONT face="Times New Roman"></FONT></FONT> </P>
<P class=MsoNormal style="MARGIN: 0in 0in 0pt"><FONT size=3><FONT face="Times New Roman">Shaun Grannis</FONT></FONT></P></FONT></DIV>
--
<PRE>
<p>_______________________________________________<br>
Sign-up for your own FREE Personalized E-mail at <a href="http://www.email.com/?sr=signup" target="_new"><font color="#0000FF">Email.com</font></a><br>
</PRE>
Attachment | Content-Type | Size |
---|---|---|
unknown_filename | text/html | 18.1 KB |
From | Date | Subject | |
---|---|---|---|
Next Message | Robert J. Sanford, Jr. | 2002-04-23 22:13:11 | client authentication protocols? |
Previous Message | Bruce Momjian | 2002-04-23 20:58:41 | Re: Returns from ExecTuplesOk() |