Re: more problems with count(*) on large table

From: Bill Moran <wmoran(at)potentialtech(dot)com>
To: Mike Charnoky <noky(at)nextbus(dot)com>
Cc: Gregory Stark <stark(at)enterprisedb(dot)com>, Alban Hertroys <a(dot)hertroys(at)magproductions(dot)nl>, Alvaro Herrera <alvherre(at)commandprompt(dot)com>, "A(dot) Kretschmer" <andreas(dot)kretschmer(at)schollglas(dot)com>, pgsql-general(at)postgresql(dot)org
Subject: Re: more problems with count(*) on large table
Date: 2007-10-02 18:01:48
Message-ID: 20071002140148.6eeb99bb.wmoran@potentialtech.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-general

In response to Mike Charnoky <noky(at)nextbus(dot)com>:

> The db server is pretty beefy: 2x Xeon 3.20 GHz with 6G RAM. The io
> subsystem is a 550G 4-disk SATA 150 RAID 10 array connected via a 3ware
> 9500S-8 controller (Seagate Nearline ST3400632NS drives). Currently,
> shared_buffers is set to 50000 (nearly 400M)

The advice on 8.x systems has been to start with 1/4-1/3 of the available
RAM on the system, and fine-tune from there. Unless there are other
(non-postgresql) functions this machine serves, you should probably up
shared_buffers to about 2G. From there, you may find that your workload
benefits from even more, or possibly less, but 400M seems pretty small
for a 6G system.

> As for the data stored in this large table, there are 15 columns. Each
> row takes roughly 134 bytes to store, not counting the index. So, for
> one day's worth of data we are talking about 1.5G/day (1.8G with the
> index). That's about 11.5M rows/day. Although the data isn't stored
> exactly sequentially by the indexed time field, it is pretty close.

How much other data is this server pushing around? If there's only that
one table in that one database, then something is wrong, as that whole
thing should be in the filesystem cache all the time. Otherwise, you
have to consider what other operations may be needing memory and moving
those tables out of the way.

> If it takes PG ~40 minutes to count(*) one day's worth of records, the
> avg throughput is 786k/s. Watching iostat during the count(*)
> operation, I see average read speeds in the range of 1100-1500k/s.

Could be a lot of fragmentation of that table. Keep in mind that if
you're deleting records occasionally, that free space will get reused,
which means an insert might not insert sequentially, it might go all
over the table.

> I guess I would expect postgres to perform a count(*) faster. When I
> run benchmarks on the machine with hdparm (with the db insert process
> running), I see the disk averages > 80MB/sec for reads
>
> # hdparm -tT /dev/sdb1
> /dev/sdb1:
> Timing cached reads: 3884 MB in 2.00 seconds = 1942.85 MB/sec
> Timing buffered disk reads: 248 MB in 3.01 seconds = 82.49 MB/sec
>
> Maybe PG has to do a lot of random disk access? I'm running bonnie++
> now to get more detailed disk performance info. As Tomasz pointed out
> maybe using CLUSTER would help, but this probably takes a long time to
> perform.

If you can spare the time, give it a try to see if it helps.

> Again, the only other thing happening with the db: a separate process is
> inserting data into this table. I have checkpoint_segments set to 64 so
> that pg is not constantly thrashing the disk with writes. The
> transaction log is on a separate disk.
>
>
> Mike
>
> Bill Moran wrote:
> > In response to Mike Charnoky <noky(at)nextbus(dot)com>:
> >
> >> This is strange... count(*) operations over a period of one day's worth
> >> of data now take ~1-2 minutes to run or ~40 minutes. It seems that the
> >> first time the data is queried it takes about 40 minutes. If I try the
> >> query again, it finishes in 1-2 minutes!
> >
> > This sounds like a caching issue. My guess at what's happening is that
> > other operations are pushing this data out of the shared_buffers, so
> > when you run it, the system has to pull a bunch of tuples off the disk
> > to check them. If you run it again immediately, the tuples are still in
> > memory, and it runs very fast.
> >
> > If this is the case, you can speed up things by adding RAM/shared_buffers,
> > or by moving to faster disks. The RAM solution is going to give you the
> > biggest performance improvement.
> >
> > However, if there's enough other data on this system, you may have
> > difficulty getting enough RAM to mitigate the problem, in which case,
> > faster disks are going to be your best bet.
> >
> > How much RAM do you have, and how much of it is allocated to shared_buffers?
> > What's your IO subsystem look like?
>
>
> ---------------------------(end of broadcast)---------------------------
> TIP 5: don't forget to increase your free space map settings

--
Bill Moran
http://www.potentialtech.com

In response to

Browse pgsql-general by date

  From Date Subject
Next Message Michael Glaesemann 2007-10-02 18:10:49 Re: Find min year and min value
Previous Message hubert depesz lubaczewski 2007-10-02 17:37:43 Re: Find min year and min value