Quick Links

Re: ANALYZE sampling is too good

From:	Heikki Linnakangas <hlinnakangas(at)vmware(dot)com>
To:	Jeff Janes <jeff(dot)janes(at)gmail(dot)com>
Cc:	Claudio Freire <klaussfreire(at)gmail(dot)com>, Jim Nasby <jim(at)nasby(dot)net>, Greg Stark <stark(at)mit(dot)edu>, Mark Kirkwood <mark(dot)kirkwood(at)catalyst(dot)net(dot)nz>, Josh Berkus <josh(at)agliodbs(dot)com>, Robert Haas <robertmhaas(at)gmail(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: ANALYZE sampling is too good
Date:	2013-12-17 16:54:43
Message-ID:	52B081D3.9090907@vmware.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Thread:
Lists:	pgsql-hackers

On 12/17/2013 12:06 AM, Jeff Janes wrote:
> On Mon, Dec 9, 2013 at 3:14 PM, Heikki Linnakangas
> <hlinnakangas(at)vmware(dot)com>wrote:
>
>> I took a stab at using posix_fadvise() in ANALYZE. It turned out to be
>> very easy, patch attached. Your mileage may vary, but I'm seeing a nice
>> gain from this on my laptop. Taking a 30000 page sample of a table with
>> 717717 pages (ie. slightly larger than RAM), ANALYZE takes about 6 seconds
>> without the patch, and less than a second with the patch, with
>> effective_io_concurrency=10. If anyone with a good test data set loaded
>> would like to test this and post some numbers, that would be great.
>
> Performance is often chaotic near transition points, so I try to avoid data
> sets that are slightly bigger or slightly smaller than RAM (or some other
> limit).
>
> Do you know how many io channels your SSD has (or whatever the term of art
> is for SSD drives)?

No idea. It's an Intel 335.

> On a RAID with 12 spindles, analyzing pgbench_accounts at scale 1000 (13GB)
> with 4 GB of RAM goes from ~106 seconds to ~19 seconds.
>
> However, I'm not sure what problem we want to solve here.

The case that Greg Stark mentioned in the email starting this thread is
doing a database-wide ANALYZE after an upgrade. In that use case, you
certainly want to get it done as quickly as possible, using all the
available resources.

> I certainly would not wish to give a background maintenance process
> permission to confiscate my entire RAID throughput for its own
> operation.

Then don't set effective_io_concurrency. If you're worried about that,
you probably wouldn't want any other process to monopolize the RAID
array either.

> Perhaps this could only be active for explicit analyze, and only if
> vacuum_cost_delay=0?

That would be a bit weird, because ANALYZE in general doesn't obey
vacuum_cost_delay. Maybe it should, though...

> Perhaps there should be something like "alter background role autovac set
> ...". Otherwise we are going to end up with an "autovacuum_*" shadow
> parameter for many of our parameters, see "autovacuum_work_mem" discussions.

Yeah, so it seems.

- Heikki

In response to

Re: ANALYZE sampling is too good at 2013-12-16 22:06:08 from Jeff Janes

Browse pgsql-hackers by date

	From	Date	Subject
Next Message	Pavel Stehule	2013-12-17 16:56:22	Re: patch: make_timestamp function
Previous Message	Robert Haas	2013-12-17 16:51:47	Re: shared memory message queues