From: | Heikki Linnakangas <hlinnakangas(at)vmware(dot)com> |
---|---|
To: | Jeff Janes <jeff(dot)janes(at)gmail(dot)com> |
Cc: | Claudio Freire <klaussfreire(at)gmail(dot)com>, Jim Nasby <jim(at)nasby(dot)net>, Greg Stark <stark(at)mit(dot)edu>, Mark Kirkwood <mark(dot)kirkwood(at)catalyst(dot)net(dot)nz>, Josh Berkus <josh(at)agliodbs(dot)com>, Robert Haas <robertmhaas(at)gmail(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org> |
Subject: | Re: ANALYZE sampling is too good |
Date: | 2013-12-17 16:54:43 |
Message-ID: | 52B081D3.9090907@vmware.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
On 12/17/2013 12:06 AM, Jeff Janes wrote:
> On Mon, Dec 9, 2013 at 3:14 PM, Heikki Linnakangas
> <hlinnakangas(at)vmware(dot)com>wrote:
>
>> I took a stab at using posix_fadvise() in ANALYZE. It turned out to be
>> very easy, patch attached. Your mileage may vary, but I'm seeing a nice
>> gain from this on my laptop. Taking a 30000 page sample of a table with
>> 717717 pages (ie. slightly larger than RAM), ANALYZE takes about 6 seconds
>> without the patch, and less than a second with the patch, with
>> effective_io_concurrency=10. If anyone with a good test data set loaded
>> would like to test this and post some numbers, that would be great.
>
> Performance is often chaotic near transition points, so I try to avoid data
> sets that are slightly bigger or slightly smaller than RAM (or some other
> limit).
>
> Do you know how many io channels your SSD has (or whatever the term of art
> is for SSD drives)?
No idea. It's an Intel 335.
> On a RAID with 12 spindles, analyzing pgbench_accounts at scale 1000 (13GB)
> with 4 GB of RAM goes from ~106 seconds to ~19 seconds.
>
> However, I'm not sure what problem we want to solve here.
The case that Greg Stark mentioned in the email starting this thread is
doing a database-wide ANALYZE after an upgrade. In that use case, you
certainly want to get it done as quickly as possible, using all the
available resources.
> I certainly would not wish to give a background maintenance process
> permission to confiscate my entire RAID throughput for its own
> operation.
Then don't set effective_io_concurrency. If you're worried about that,
you probably wouldn't want any other process to monopolize the RAID
array either.
> Perhaps this could only be active for explicit analyze, and only if
> vacuum_cost_delay=0?
That would be a bit weird, because ANALYZE in general doesn't obey
vacuum_cost_delay. Maybe it should, though...
> Perhaps there should be something like "alter background role autovac set
> ...". Otherwise we are going to end up with an "autovacuum_*" shadow
> parameter for many of our parameters, see "autovacuum_work_mem" discussions.
Yeah, so it seems.
- Heikki
From | Date | Subject | |
---|---|---|---|
Next Message | Pavel Stehule | 2013-12-17 16:56:22 | Re: patch: make_timestamp function |
Previous Message | Robert Haas | 2013-12-17 16:51:47 | Re: shared memory message queues |