Quick Links

Re: Does auto-analyze work on dirty writes?

From:	Robert Haas <robertmhaas(at)gmail(dot)com>
To:	Mark Mielke <mark(at)mark(dot)mielke(dot)cc>
Cc:	Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Conor Walsh <ctw(at)adverb(dot)ly>, jd(at)commandprompt(dot)com, "pgsql-performance(at)postgresql(dot)org" <pgsql-performance(at)postgresql(dot)org>
Subject:	Re: Does auto-analyze work on dirty writes?
Date:	2011-02-10 17:51:31
Message-ID:	AANLkTimeTk_reda4x3WwsgOfXN6NO6Be63YyjJ5UnZLz@mail.gmail.com
Views:	Whole Thread \| Raw Message \| Download mbox \| Resend email
Thread:
Lists:	pgsql-hackers pgsql-performance

On Fri, Feb 4, 2011 at 8:50 PM, Mark Mielke <mark(at)mark(dot)mielke(dot)cc> wrote:
> On 02/04/2011 10:41 AM, Tom Lane wrote:
>>
>> 1. Autovacuum fires when the stats collector's insert/update/delete
>> counts have reached appropriate thresholds. Those counts are
>> accumulated from messages sent by backends at transaction commit or
>> rollback, so they take no account of what's been done by transactions
>> still in progress.
>>
>> 2. Only live rows are included in the stats computed by ANALYZE.
>> (IIRC it uses SnapshotNow to decide whether rows are live.)
>>
>> Although the stats collector does track an estimate of the number of
>> dead rows for the benefit of autovacuum, this isn't used by planning.
>> Table bloat is accounted for only in terms of growth of the physical
>> size of the table in blocks.
>
> Thanks, Tom.
>
> Does this un-analyzed "bloat" not impact queries? I guess the worst case
> here is if autovaccum is disabled for some reason and 99% of the table is
> dead rows. If I understand the above correctly, I think analyze might
> generate a bad plan under this scenario, thinking that a value is unique,
> using the index - but every tuple in the index has the same value and each
> has to be looked up in the table to see if it is visible?

It sounds like you're describing something like a one-row table with a
unique index on one of its column, getting updates that can't be made
HOT, and not getting vacuumed. That scenario does suck - I had a test
case I was using it a while back that generated something similar -
but I'm not sure how much it's worth worrying about the plan, because
either an index scan or a sequential scan is going to be awful.

To put that another way, I've founded that the optimizer copes pretty
well with adjusting plans as tables get bloated - mostly by using
index scans rather than sequential scans. It's possible there is some
improvement still to be had there, but I would be a lot more
interested in fixing the bloat, at least based on my own experiences.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

In response to

Re: Does auto-analyze work on dirty writes? at 2011-02-05 01:50:13 from Mark Mielke

Browse pgsql-hackers by date

	From	Date	Subject
Next Message	Greg Smith	2011-02-10 17:56:10	Re: Why we don't want hints Was: Slow count(*) again...
Previous Message	Kevin Grittner	2011-02-10 17:44:29	Re: Why we don't want hints Was: Slow count(*) again...

Browse pgsql-performance by date

	From	Date	Subject
Next Message	Greg Smith	2011-02-10 17:56:10	Re: Why we don't want hints Was: Slow count(*) again...
Previous Message	Kevin Grittner	2011-02-10 17:44:29	Re: Why we don't want hints Was: Slow count(*) again...