Re: count * performance issue

From: Arjen van der Meijden <acmmailing(at)tweakers(dot)net>
To: Craig James <craig_james(at)emolecules(dot)com>
Cc: pgsql-performance(at)postgresql(dot)org
Subject: Re: count * performance issue
Date: 2008-03-08 08:04:31
Message-ID: 47D2488F.1000801@tweakers.net
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-performance

On 6-3-2008 16:28 Craig James wrote:
> On the one hand, I understand that Postgres has its architecture, and I
> understand the issue of row visibility, and so forth. On the other
> hand, my database is just sitting there, nothing going on, no
> connections except me, and... it takes FIFTY FIVE SECONDS to count 20
> million rows, a query that either Oracle or MySQL would answer in a
> fraction of a second. It's hard for me to believe there isn't a better
> way.

Can you explain to me how you'd fit this in a fraction of a second?

mysql> select count(*) from messages;
+----------+
| count(*) |
+----------+
| 21908505 |
+----------+
1 row in set (8 min 35.09 sec)

This is a table containing the messages on forumtopics and is therefore
relatively large. The hardware is quite beefy for a forum however (4
3Ghz cores, 16GB, 14+1 disk raid5). This table has about 20GB of data.

If I use a table that contains about the same amount of records as the
above and was before this query probably much less present in the
innodb-buffer (but also less frequently touched by other queries), we
see this:

mysql> select count(*) from messagesraw;
+----------+
| count(*) |
+----------+
| 21962804 |
+----------+
1 row in set (5 min 16.41 sec)

This table is about 12GB.

In both cases MySQL claimed to be 'Using index' with the PRIMARY index,
which for those tables is more or less identical.

Apparently the time is still table-size related, not necessarily
tuple-count related. As this shows:

mysql> select count(*) from articlestats;
+----------+
| count(*) |
+----------+
| 34467246 |
+----------+
1 row in set (54.14 sec)

that table is only 2.4GB, but contains 57% more records, although this
was on another database on a system with somewhat different specs (8
2.6Ghz cores, 16GB, 7+7+1 raid50), used a non-primary index and I have
no idea how well that index was in the system's cache prior to this query.

Repeating it makes it do that query in 6.65 seconds, repeating the
12GB-query doesn't make it any faster.

Anyway, long story short: MySQL's table-count stuff also seems
table-size related. As soon as the index it uses fits in the cache or it
doesn't have to use the primary index, it might be a different story,
but when the table(index) is too large to fit, it is quite slow.
Actually, it doesn't appear to be much faster than Postgresql's (8.2)
table-based counts. If I use a much slower machine (2 2Ghz opterons, 8GB
ddr memory, 5+1 old 15k rpm scsi disks in raid5) with a 1GB, 13M record
table wich is similar to the above articlestats, it is able to return a
count(*) in 3 seconds after priming the cache.

If you saw instantaneous results with MySQL, you have either seen the
query-cache at work or where using myisam. Or perhaps with a fast
system, you had small tuples with a nice index in a nicely primed cache.

Best regards,

Arjen

In response to

Browse pgsql-performance by date

  From Date Subject
Next Message Alvaro Herrera 2008-03-08 14:17:50 Re: Re: Confirmação de envio / Sending confirmation (captchaid:13266b402f09)
Previous Message paul rivers 2008-03-08 07:11:19 Re: count * performance issue