From: | Kevin Grittner <kgrittn(at)ymail(dot)com> |
---|---|
To: | "pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org> |
Subject: | Fixing statistics problem related to vacuum truncation termination |
Date: | 2013-04-25 23:29:39 |
Message-ID: | 1366932579.39636.YahooMailNeo@web162902.mail.bf1.yahoo.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
After reviewing the threads and thinking about the various
historical behaviors, I'm suggesting that we apply the attached,
back-patched to 9.0, to fix the unintended changes from historical
behavior related to lack of statistics generation in some cases
where statistics were generated before
b19e4250b45e91c9cbdd18d35ea6391ab5961c8d, and to generate them in
some cases where they were historically suppressed.
Prior behavior was different between a VACUUM command and
autovacuum when there was sufficient empty space at the end of a
table to trigger an attempt to truncate the table:
* For a VACUUM command, statistics were always generated on a
VACUUM ANALYZE, and truncation might or might not occur, depending
on whether it was able to immediately get an AccessExclusiveLock on
the table when it got to this phase of VACUUM processing. If it
was able to get the lock, it would hold it for as long as the
truncation took, blocking any other access to the table --
potentially for a very long time. If it was not immediately able
to get the lock, it would not attempt any truncation, so truncation
was not deterministic before the recent patch.
* For autovacuum, if it was initially unable to acquire the
AccessExclusiveLock on the table when it got to this phase no
truncation was attempted and statistics were generated. If it
acquired the lock and blocked any other process for the duration
set by deadlock_timeout all work toward truncation was discarded
and no statistics were generated. If it was able to complete the
truncation, statistics were generated. So before the recent patch
neither truncation nor statistics generation were deterministic.
Current behavior for both the VACUUM command and autovacuum is to
avoid any prolonged holding of AccessExclusiveLock on the table
when there is contention for the lock, with limited retries to
acquire or reacquire the lock to make incremental progress on
truncation. Any progress on truncating is not lost if the lock is
relinquished to allow other tasks to proceed. I'm proposing that
we don't change that part of it.
The problem is that current behavior is to skip statistics
generation if the truncation attempt is terminated due to
contention for the table lock; whereas historically that was never
skipped for the VACUUM ANALYZE command, and only sometimes skipped
when autovacuum was intending to analyze the table but was unable
to complete the truncation. The attached will not skip the analyze
step where it had historically run, and will actually allow
autovacuum to run it when the truncation attempt was started but
not able to complete. The old mechanism for terminating the
truncation attempt (a cancel signal from the blocked process) did
not allow this.
The attached is along the lines of what Tom suggested was the
minimal fix, and less drastic than what I was initially proposing
-- which was to also restore historical behavior for the VACUUM
command. After seeing how unpredictable that behavior was
regarding truncation, it doesn't seem wise to complicate the code
to try to go back to that.
I also think that the new LOG level entry about giving up on the
truncate attempt is too chatty, and we've gotten questions from
users who were somewhat alarmed by it, so I toned it down. I'm
still not sure that the logging is quite optimal yet, so any
suggestions are welcome.
The only change outside of local naming, white space, comments,
messages, and moving a couple variables into a more local scope is
this:
- if (!vacrelstats->lock_waiter_detected)
- pgstat_report_vacuum(RelationGetRelid(onerel),
- onerel->rd_rel->relisshared,
- new_rel_tuples);
- else
- vacstmt->options &= ~VACOPT_ANALYZE;
+ pgstat_report_vacuum(RelationGetRelid(onerel),
+ onerel->rd_rel->relisshared,
+ new_rel_tuples);
... which simply reverts this part to match older code.
This is being presented for discussion; I have not finished testing
it.
Comments?
--
Kevin Grittner
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company
Attachment | Content-Type | Size |
---|---|---|
autovac-stat-fix.diff | text/x-patch | 4.5 KB |
From | Date | Subject | |
---|---|---|---|
Next Message | Daniel Farina | 2013-04-25 23:48:48 | Re: 9.3 release notes suggestions |
Previous Message | Tom Lane | 2013-04-25 22:04:10 | Re: Bug Fix: COLLATE with multiple ORDER BYs in aggregates |