Re: Slow count(*) again...

From: Neil Whelchel <neil(dot)whelchel(at)gmail(dot)com>
To: pgsql-performance(at)postgresql(dot)org
Subject: Re: Slow count(*) again...
Date: 2010-10-13 08:38:38
Message-ID: 201010130138.39893.neil.whelchel@gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers pgsql-performance

On Wednesday 13 October 2010 00:19:26 Mark Kirkwood wrote:
> On 13/10/10 19:47, Neil Whelchel wrote:
> > Nope...
> > So, possible conclusions are:
> > 1. Even with VACUUM database table speed degrades as tables are updated.
> > 2. Time testing on a freshly INSERTed table gives results that are not
> > real- world.
> > 3. Filesystem defragmentation helps (some).
> > 4. Cache only makes a small difference once a table has been UPDATEd.
> >
> > I am going to leave this configuration running for the next day or so.
> > This way I can try any suggestions and play with any more ideas that I
> > have. I will try these same tests on ext4 later, along with any good
> > suggested tests.
> > I will try MySQL with the dame data with both XFS and ext4.
> > -Neil-
>
> I think that major effect you are seeing here is that the UPDATE has
> made the table twice as big on disk (even after VACUUM etc), and it has
> gone from fitting in ram to not fitting in ram - so cannot be
> effectively cached anymore.
>
> This would not normally happen in real life (assuming UPDATEs only
> modify a small part of a table per transaction). However administration
> updates (e.g 'oh! - ref 1 should now be ref 2 please update
> everything') *will* cause the table size to double.
>
> This is an artifact of Postgres's non overwriting storage manager -
> Mysql will update in place and you will not see this.
>
> Try VACUUM FULL on the table and retest.
>
> regards
>
> Mark

There seems to be allot of discussion about VACUUM FULL, and its problems. The
overall buzz seems to be that VACUUM FULL is a bad idea (I could be wrong
here). It has been some time since I have read the changelogs, but I seem to
remember that there have been some major changes to VACUUM FULL recently.
Maybe this needs to be re-visited in the documentation.

crash:~# time psql -U test test -c "VACUUM FULL log;"
VACUUM

real 4m49.055s
user 0m0.000s
sys 0m0.000s

crash:~# time psql -U test test -c "SELECT count(*) FROM log;"
count
----------
10050886
(1 row)

real 0m9.665s
user 0m0.000s
sys 0m0.004s

A huge improvement from the minute and a half before the VACUUM FULL.
crash:~# time psql -U test test -c "SELECT count(*) FROM log;"
count
----------
10050886
(1 row)

real 0m3.786s
user 0m0.000s
sys 0m0.000s

And the cache helps...
So, we are right back to within 10ms of where we started after INSERTing the
data, but it took a VACUUM FULL to accomplish this (by making the table fit in
RAM).
This is a big problem on a production machine as the VACUUM FULL is likely to
get in the way of INSERTing realtime data into the table.

So to add to the conclusion pile:
5. When you have no control over the WHERE clause which may send count(*)
through more rows of a table that would fit in RAM your performance will be
too slow, so count is missing a LIMIT feature to avoid this.
6. Keep tables that are to be updated frequently as narrow as possible: Link
them to wider tables to store the columns that are less frequently updated.

So with our conclusion pile so far we can deduce that if we were to keep all
of our data in two column tables (one to link them together, and the other to
store one column of data), we stand a much better chance of making the entire
table to be counted fit in RAM, so we simply apply the WHERE clause to a
specific table as opposed to a column within a wider table... This seems to
defeat the entire goal of the relational database...

-Neil-

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Mladen Gogala 2010-10-13 08:40:53 Re: Slow count(*) again...
Previous Message Markus Wanner 2010-10-13 08:05:14 Re: Issues with Quorum Commit

Browse pgsql-performance by date

  From Date Subject
Next Message Mladen Gogala 2010-10-13 08:40:53 Re: Slow count(*) again...
Previous Message Craig Ringer 2010-10-13 08:11:48 Re: SQL functions vs. PL/PgSQL functions