From: | Greg Smith <greg(at)2ndquadrant(dot)com> |
---|---|
To: | "Joshua D(dot) Drake" <jd(at)commandprompt(dot)com> |
Cc: | Greg Stark <gsstark(at)mit(dot)edu>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Kevin Grittner <Kevin(dot)Grittner(at)wicourts(dot)gov>, Josh Berkus <josh(at)agliodbs(dot)com>, Nathan Boley <npboley(at)gmail(dot)com>, pgsql-hackers(at)postgresql(dot)org |
Subject: | Re: Thoughts on statistics for continuously advancing columns |
Date: | 2009-12-31 01:40:09 |
Message-ID: | 4B3C00F9.7030805@2ndquadrant.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
Joshua D. Drake wrote:
> postgres=# analyze verbose test_ten_million;
> INFO: analyzing "public.test_ten_million"
> INFO: "test_ten_million": scanned 3000 of 44248 pages, containing 678000
> live rows and 0 dead rows; 3000 rows in sample, 10000048 estimated total
> rows
> ANALYZE
> Time: 20145.148 ms
>
At an ever larger table sizes, this would turn into 3000 random seeks
all over the drive, one at a time because there's no async I/O here to
queue requests better than that for this access pattern. Let's say they
take 10ms each, not an unrealistic amount of time on current hardware.
That's 30 seconds, best case, which is similar to what JD's example is
showing even on a pretty small data set. Under load it could easily
take over a minute, hammering the disks the whole time, and in a TOAST
situation you're doing even more work. It's not outrageous and it
doesn't scale linearly with table size, but it's not something you want
to happen any more than you have to either--consider the poor client who
is trying to get their work done while that is going on.
On smaller tables, you're both more likely to grab a useful next page
via readahead, and to just have the data you need cached in RAM
already. There's a couple of "shelves" in the response time to finish
ANALYZE as you exceed L1/L2 CPU cache size and RAM size, then it trails
downward as the seeks get longer and longer once the data you need is
spread further across the disk(s). That the logical beginning of a
drive is much faster than the logical end doesn't help either. I should
generate that graph again one day somewhere I can release it at...
--
Greg Smith 2ndQuadrant Baltimore, MD
PostgreSQL Training, Services and Support
greg(at)2ndQuadrant(dot)com www.2ndQuadrant.com
From | Date | Subject | |
---|---|---|---|
Next Message | Tatsuo Ishii | 2009-12-31 01:48:48 | Re: exec_execute_message crash |
Previous Message | David E. Wheeler | 2009-12-31 00:41:42 | Re: Status of plperl inter-sp calling |