Re: Slow count(*) again...

From: Neil Whelchel <neil(dot)whelchel(at)gmail(dot)com>
To: pgsql-performance(at)postgresql(dot)org
Subject: Re: Slow count(*) again...
Date: 2010-10-13 10:16:11
Message-ID: 201010130316.12922.neil.whelchel@gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers pgsql-performance

On Wednesday 13 October 2010 01:50:23 Mark Kirkwood wrote:
> On 13/10/10 21:38, Neil Whelchel wrote:
> > So with our conclusion pile so far we can deduce that if we were to keep
> > all of our data in two column tables (one to link them together, and the
> > other to store one column of data), we stand a much better chance of
> > making the entire table to be counted fit in RAM, so we simply apply the
> > WHERE clause to a specific table as opposed to a column within a wider
> > table... This seems to defeat the entire goal of the relational
> > database...
>
> That is a bit excessive I think - a more reasonable conclusion to draw
> is that tables bigger than ram will drop to IO max speed to scan, rather
> than DIMM max speed...
>
> There are things you can do to radically improve IO throughput - e.g a
> pair of AMC or ARECA 12 slot RAID cards setup RAID 10 and tuned properly
> should give you a max sequential throughput of something like 12*100
> MB/s = 1.2 GB/s. So your example table (estimated at 2GB) so be able to
> be counted by Postgres in about 3-4 seconds...
>
> This assumes a more capable machine than you are testing on I suspect.
>
> Cheers
>
> Mark
The good ol' bruit force approach! I knew I'd see this one sooner or later.
Though I was not sure if I was going to see the 16TB of RAM suggestion first.
Seriously though, as the title of this thread suggests, everything is
relative. Sure count(*) and everything else will work faster with more system
power. It just seems to me that count(*) is slower than it could be given a
set of conditions. I started this thread because I think that there must be a
better way to count matches from an INDEXed column than shoving the entire
table through RAM (including columns that you are not interested in at the
minute). And even worse, when you have no (reasonable) control of the WHERE
clause preventing your system from thrashing for the next week because
somebody put in criteria that matched a few TB of records and there is no way
to LIMIT count(*) other than externally timing the query and aborting it if it
takes too long. Whet is needed is a way to determine how many rows are likely
to match a given WHERE clause so we can cut off useless queries, but we need a
fast count(*) for that, or a limit on the existing one... I seem to remember
saying something about an index driven estimate(*) at one point...

I might go as far as to rattle the cage of the developers to see if it makes
any sense to add some column oriented storage capability to Postgres. That
would be the hot ticket to be able to specify an attribute on a column so that
the back end could shadow or store a column in a column oriented table so
aggregate functions could work on them with good efficiency, or is that an
INDEX?

Since the thread has started, I have had people ask about different system
configurations, especially the filesystem (XFS, ext4...). I have never tested
ext4, and since we are all involved here, I thought that I could do so and
share my results for others, that is why I got into time testing stuff.
Time testing count(*) in my later postings is really not the point as count is
simply dragging the entire table off of the RAID through RAM, I can use any
other function like max()... No that can narrow down its scan with an INDEX...
Ok, sum(), there we go!

-Neil-

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Pavel Stehule 2010-10-13 10:36:59 Re: wip: functions median and percentile
Previous Message Fujii Masao 2010-10-13 09:22:41 Re: Issues with Quorum Commit

Browse pgsql-performance by date

  From Date Subject
Next Message Vitalii Tymchyshyn 2010-10-13 10:41:52 Re: Slow count(*) again...
Previous Message Mark Kirkwood 2010-10-13 08:50:23 Re: Slow count(*) again...