Re: Fixes for two separate bugs in nbtree VACUUM's page deletion

From: Masahiko Sawada <masahiko(dot)sawada(at)2ndquadrant(dot)com>
To: Peter Geoghegan <pg(at)bowt(dot)ie>
Cc: PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>, Teodor Sigaev <teodor(at)sigaev(dot)ru>, Heikki Linnakangas <hlinnaka(at)iki(dot)fi>
Subject: Re: Fixes for two separate bugs in nbtree VACUUM's page deletion
Date: 2020-04-30 07:20:18
Message-ID: CA+fd4k6wCqk-YihAdXk-a6d+o8q8WJ=XiY7yVfOBEooYMENc2g@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Thu, 30 Apr 2020 at 07:29, Peter Geoghegan <pg(at)bowt(dot)ie> wrote:
>
> On Tue, Apr 28, 2020 at 12:21 AM Masahiko Sawada
> <masahiko(dot)sawada(at)2ndquadrant(dot)com> wrote:
> > For the first fix it seems better to push down the logic to the page
> > deletion code as your 0001 patch does so. The following change changes
> > the page deletion code so that it emits LOG message indicating the
> > index corruption when a deleted page is passed. But we used to ignore
> > in this case.
>
> Attached is v2 of the patch set, which includes a new patch that I
> want to target the master branch with. This patch (v2-0003-*)
> refactors btvacuumscan().
>
> This revision also changed the first bugfix patch. I have simplified
> some of the details in nbtree.c that were added by commit 857f9c36cda.
> Can't we call _bt_update_meta_cleanup_info() at a lower level, like in
> the patch? AFAICT it makes more sense to just call it in
> btvacuumscan(). Please let me know what you think of those changes.

Yes, I agree.

>
> The big change in the new master-only refactoring patch (v2-0003-*) is
> that I now treat a call to btvacuumpage() that has to
> "backtrack/recurse" and doesn't find a page that looks like the
> expected right half of a page split (a page split that occurred since
> the VACUUM operation began) as indicating corruption. This corruption
> is logged. I believe that we should never see this happen, for reasons
> that are similar to the reasons why _bt_pagedel() never finds a
> deleted page when moving right, as I went into in my recent e-mail to
> this thread. I think that this is worth tightening up for Postgres 13.
>
> I will hold off on committing v2-0003-*, since I need to nail down the
> reasoning for treating this condition as corruption. Plus it's not
> urgent. I think that the general direction taken in v2-0003-* is the
> right one, in any case. The "recursion" in btvacuumpage() doesn't make
> much sense -- it obscures what's really going on IMV. Also, using the
> variable name "scanblkno" at every level -- btvacuumscan(),
> btvacuumpage(), and _bt_pagedel() -- makes it clear that it's the same
> block at all levels of the code. And that the "backtrack/recursion"
> case doesn't kill tuples from the "scanblkno" block (and doesn't
> attempt to call _bt_pagedel(), which is designed only to handle blocks
> passed by btvacuumpage() when they are the current "scanblkno"). It
> seems unlikely that the VACUUM VERBOSE pages deleted accounting bug
> would have happened if the code was already structured this way.

+1 for refactoring. I often get confused that btvacuumpage() takes two
block numbers (blkno and orig_blkno) in spite of the same value. Your
change also makes it clear.

For the part of treating that case as an index corruption I will need
some time to review because of lacking knowledge of btree indexes. So
I'll review it later.

Regards,

--
Masahiko Sawada http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Daniel Gustafsson 2020-04-30 07:51:06 Re: Improve errors when setting incorrect bounds for SSL protocols
Previous Message Julien Rouhaud 2020-04-30 07:18:57 Re: WAL usage calculation patch