From: | Andres Freund <andres(at)anarazel(dot)de> |
---|---|
To: | Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> |
Cc: | David Rowley <david(dot)rowley(at)2ndquadrant(dot)com>, Thomas Munro <thomas(dot)munro(at)gmail(dot)com>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org> |
Subject: | Re: Parallel Append subplan order instability on aye-aye |
Date: | 2019-07-17 23:12:32 |
Message-ID: | 20190717231232.ssx6fxay756exn6a@alap3.anarazel.de |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
On 2019-07-17 11:53:48 -0400, Tom Lane wrote:
> David Rowley <david(dot)rowley(at)2ndquadrant(dot)com> writes:
> > Surely it can't be that since that just sets what *pages gets set to.
> > Tom mentioned that following was returning 0 pages and tuples:
>
> > -- Temporary hack to investigate whether extra vacuum/analyze is happening
> > select relname, relpages, reltuples
> > from pg_class
> > where relname like '__star' order by relname;
> > relname | relpages | reltuples
> > ---------+----------+-----------
> > a_star | 1 | 3
>
> I poked around a little and came up with a much simpler theory:
> VACUUM will not change relpages/reltuples if it does not scan any pages
> (cf. special case for tupcount_pages == 0 in heap_vacuum_rel, at line 343
> in HEAD's vacuumlazy.c). And, because sanity_check.sql's VACUUM is a
> plain unaggressive vacuum, all that it takes to make it skip over a_star's
> one page is for somebody else to have a pin on that page.
I wonder if we could set log_min_messages to DEBUG2 on occasionally
failing machines to test that theory. That ought to hit
appendStringInfo(&buf, ngettext("Skipped %u page due to buffer pins, ",
"Skipped %u pages due to buffer pins, ",
vacrelstats->pinskipped_pages),
...
ereport(elevel,
(errmsg("\"%s\": found %.0f removable, %.0f nonremovable row versions in %u out of %u pages",
RelationGetRelationName(onerel),
tups_vacuumed, num_tuples,
vacrelstats->scanned_pages, nblocks),
errdetail_internal("%s", buf.data)));
> So a chance
> collision with the bgwriter or checkpointer could cause the observed
> symptom, not just for a_star but for the other single-page relations that
> are at stake here. Those pages are certainly dirty after create_misc.sql,
> so it's hardly implausible for one of these processes to be holding pin
> while trying to write out the buffer at the time sanity_check.sql runs.
>
> A brute-force way to fix this (or at least reduce the odds quite a bit)
> would be to have sanity_check.sql issue a CHECKPOINT before its VACUUM,
> thereby guaranteeing that none of these pages are still in need of being
> written. Not sure how much that'd penalize the regression tests' runtime,
> or whether we'd have a loss of test coverage of VACUUM behaviors.
Alternatively we could VACUUM FREEZE the relevant tables? That then
ought to hit the blocking codepath in lazu_scan_heap()?
Greetings,
Andres Freund
From | Date | Subject | |
---|---|---|---|
Next Message | Andres Freund | 2019-07-17 23:17:01 | Re: refactoring - share str2*int64 functions |
Previous Message | Fabien COELHO | 2019-07-17 22:59:01 | Re: refactoring - share str2*int64 functions |