Re: Slow count(*) again...

From: Mladen Gogala <mladen(dot)gogala(at)vmsinfo(dot)com>
To: "david(at)lang(dot)hm" <david(at)lang(dot)hm>
Cc: Craig Ringer <craig(at)postnewspapers(dot)com(dot)au>, Vitalii Tymchyshyn <tivv00(at)gmail(dot)com>, "pgsql-performance(at)postgresql(dot)org" <pgsql-performance(at)postgresql(dot)org>
Subject: Re: Slow count(*) again...
Date: 2010-10-12 17:36:46
Message-ID: 4CB49CAE.8060106@vmsinfo.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers pgsql-performance

david(at)lang(dot)hm wrote:
> On Tue, 12 Oct 2010, Mladen Gogala wrote:
>
>
>> david(at)lang(dot)hm wrote:
>>
>>> from a PR point of view, speeding up the trivil count(*) case could be
>>> worth it, just to avoid people complaining about it not being fast.
>>>
>>>
>>>
>> Fixing PR stuff is not the approach that I would take. People are complaining
>> about select count(*) because they're using it in all the wrong places.
>>
>
> that may be the case, but if it's possible to make it less painful it will
> mean more people use postgres, both because it works better for them when
> they are using the suboptimal programs, but also because when people do
> their trivial testing of databases to decide which one they will use, they
> won't rule out postgres because "it's so slow"
>
>

There is no free lunch. If the count field is maintained somewhere, the
concurrency will suffer. I find the idea of fixing the "count delusion"
ridiculous, may Richard Dawkins forgive me for this pun. Saying that
something is slow without testing and a proper
consideration is ridiculous. As a DBA, I usually get complaints like
"the database is slow today" 3 times before lunch, every day. The
database is never slow, the database is a warehouse where you keep your
data. What is slow is the access to the data, and that is done by, guess
what, the application program. Almost always, it's the application
that's slow, not the database. As for the "select count(*)", idiom, what
are you trying to do? Where are you using it? If you are using it for
pagination, consider the possibility of not specifying
the number of pages on the website, just the links "next -->" and "prev
<--". Alternatively, you can fetch a small amount into the web page and
direct the users who would like to see the complete information to a
background reporting too. Mixing batch reports and online reports is a
very easy thing to do. If you are using it to establish existence,
you're doing it wrong. I've had a problem like that this morning. A
developer came to me with the usual phrase that the "database is slow".
It was a PHP form which should write an output file and let the user
know where the file is. The function looks like this:

function put_xls($sth) {
global $FNAME;
$FNAME=$FNAME.".xls";
$lineno=0;
$ncols=$sth->FieldCount();
for ($i = 0;$i <= $ncols;$i++) {
$cols[$i] = $sth->FetchField($i);
$colnames[$i]=$cols[$i]->name;
}
$workbook = new Spreadsheet_Excel_Writer("/software$FNAME");
$format_bold =& $workbook->addFormat();
$format_bold->setBold();
$format_bold->setAlign('left');
$format_left =& $workbook->addFormat();
$format_left->setAlign('left');
$worksheet = & $workbook->addWorksheet('Moreover Search');
$worksheet->writeRow($lineno++,0,$colnames,$format_bold);
while($row=$sth->FetchRow()) {
$worksheet->writeRow($lineno++,0,$row,$format_left);
}
$workbook->close();
$cnt=$sth->Recordcount();
return($cnt);
}

The relevant includes are here:

require ('Date.php');
require ('adodb5/tohtml.inc.php');
require_once ('adodb5/adodb.inc.php');
require_once ('adodb5/adodb-exceptions.inc.php');
require_once 'Spreadsheet/Excel/Writer.php';
$ADODB_FETCH_MODE = ADODB_FETCH_NUM;

So, what is the problem here? Why was the "database slow"? As it turns
out, the PEAR module for writing Excel spreadsheets, which is the tool
used here, creates the entire spreadsheet in memory and writes it out
on the "close" command. What was spinning was "httpd" process, the
database was completely and utterly idle, rolling thumbs and awaiting
orders. Using the "fputcsv" instead, made the function fly. The only
thing that was lost were the bold column titles. Changing little things
can result in the big performance gains. Making "select count(*)"
unnaturally fast would be tending to bad programming practices. I am
not sure that this is a desirable development. You can't expect people
to adjust the database software to your application. Applications are
always database specific. Writing an application that will access a
PostgreSQL database is not the same as writing an application that will
access an Oracle database.

--

Mladen Gogala
Sr. Oracle DBA
1500 Broadway
New York, NY 10036
(212) 329-5251
http://www.vmsinfo.com
The Leader in Integrated Media Intelligence Solutions

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Magnus Hagander 2010-10-12 18:03:29 Re: [JDBC] Support for JDBC setQueryTimeout, et al.
Previous Message Peter Eisentraut 2010-10-12 17:31:26 Re: Bug in information_schema: column names don't match spec

Browse pgsql-performance by date

  From Date Subject
Next Message Jesper Krogh 2010-10-12 18:22:01 Re: Slow count(*) again...
Previous Message Jon Nelson 2010-10-12 17:24:47 read only transactions