From: | Jeff Janes <jeff(dot)janes(at)gmail(dot)com> |
---|---|
To: | Heikki <hlinnaka(at)iki(dot)fi>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org> |
Subject: | Re: Potential GIN vacuum bug |
Date: | 2015-08-17 21:41:09 |
Message-ID: | CAMkU=1zTC_EWFeQdGjvRmL39AP2fpKM426MjcCOMxAw_FFSHWA@mail.gmail.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
On Mon, Aug 17, 2015 at 6:23 AM, Jeff Janes <jeff(dot)janes(at)gmail(dot)com> wrote:
>
> On Aug 16, 2015 11:49 PM, "Heikki Linnakangas" <hlinnaka(at)iki(dot)fi> wrote:
> >
> > On 08/16/2015 12:58 AM, Jeff Janes wrote:
> >>
> >> When ginbulkdelete gets called for the first time in a VACUUM(i.e.
> stats
> >> == NULL), one of the first things it does is call ginInsertCleanup to
> get
> >> rid of the pending list. It does this in lieu of vacuuming the pending
> >> list.
> >>
> >> This is important because if there are any dead tids still in the
> Pending
> >> list, someone else could come along during the vacuum and post the dead
> >> tids into a part of the index that VACUUM has already passed over.
> >>
> >> The potential bug is that ginInsertCleanup exits early (ginfast.c lines
> >> 796, 860, 898) if it detects that someone else is cleaning up the
> pending
> >> list, without waiting for that someone else to finish the job.
> >>
> >> Isn't this a problem?
> >
> >
> > Yep, I think you're right. When that code runs as part of VACUUM, it
> should not give up like that.
> >
> > Hmm, I see other race conditions in that code too. Even if VACUUM wins
> the race you spotted, and performs all the insertions, reaches the end of
> the pending items list, and deletes the pending list pages, it's possible
> that another backend started earlier, and is still processing the same
> items from the pending items list. It will add them to the tree, and after
> it's finished with that it will see that the pending list page was already
> deleted, and bail out. But if there is a dead tuple in the pending items
> list, you have trouble. The other backend will re-insert it, and that might
> happen after VACUUM had already removed it from the tree.
>
> Could the right to clean the pending list be represented by a
> self-conflicting heavy weight lock on the index? Vacuum could block on it,
> while user back-ends could try to get it conditionally and just give up on
> the cleanup if it is not available.
>
> >
> > Also, ginInsertCleanup() seems to assume that if another backend has
> just finished cleaning up the same page, it will see the page marked as
> deleted. But what if the page is not only marked as deleted, but also
> reused for something else already?
>
> Yeah. Which is possible but pretty unlikely now; but would be far more
> likely if we added these page to the fsm more aggressively.
>
The attached patch takes a ShareUpdateExclusiveLock lock on the index in
order to clean the pending list.
This fixes the problem I had been seeing when testing
https://commitfest.postgresql.org/6/322/ (which was probably caused by the
deleted page situation you mentioned, not by tids getting revived issue I
originally brought up.)
User backends attempt to take the lock conditionally, because otherwise
they would cause an autovacuum already holding the lock to cancel itself,
which seems quite bad.
Not that this a substantial behavior change, in that with this code the
user backends which find the list already being cleaned will just add to
the end of the pending list and go about their business. So if things are
added to the list faster than they can be cleaned up, the size of the
pending list can increase without bound.
Under the existing code each concurrent user backend will try to clean the
pending list at the same time. The work doesn't parallelize, so doing this
is just burns CPU (and possibly consuming up to maintenance_work_mem for
*each* backend) but it does server to throttle the insertion rate and so
keep the list from growing without bound.
This is just a proof-of-concept patch, because I don't know if this
approach is the right approach.
One potential problem is how it will interact with "create index
concurrently".
Cheers,
Jeff
Attachment | Content-Type | Size |
---|---|---|
gin_pending_lock.patch | application/octet-stream | 1.3 KB |
From | Date | Subject | |
---|---|---|---|
Next Message | Robert Haas | 2015-08-17 21:41:24 | Re: More WITH |
Previous Message | Robert Haas | 2015-08-17 21:40:12 | Re: why can the isolation tester handle only one waiting process? |