Re: The lightbulb just went on...

From: The Hermit Hacker <scrappy(at)hub(dot)org>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Alfred Perlstein <bright(at)wintelcom(dot)net>, "Mikheev, Vadim" <vmikheev(at)SECTORBASE(dot)COM>, pgsql-hackers(at)postgreSQL(dot)org
Subject: Re: The lightbulb just went on...
Date: 2000-10-17 00:54:00
Message-ID: Pine.BSF.4.21.0010162151320.342-100000@thelab.hub.org
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers


Something to force a v7.0.3 ... ?

On Mon, 16 Oct 2000, Tom Lane wrote:

> ... with a blinding flash ...
>
> The VACUUM funnies I was complaining about before may or may not be real
> bugs, but they are not what's biting Alfred. None of them can lead to
> the observed crashes AFAICT.
>
> What's biting Alfred is the code that moves a tuple update chain, lines
> 1541 ff in REL7_0_PATCHES. This sets up a pointer to a source tuple in
> "tuple". Then it gets the destination page it plans to move the tuple
> to, and applies vc_vacpage to that page if it hasn't been done already.
> But when we're moving a tuple chain, *it is possible for the destination
> page to be the same as the source page*. Since vc_vacpage applies
> PageRepairFragmentation, all the live tuples on the page may get moved.
> Afterwards, tuple.t_data is out of date and pointing at some random
> chunk of some other tuple. The subsequent copy of the tuple copies
> garbage, which explains Alfred's several crashes in constructing index
> entries for the copied tuple (all of which bombed out from the
> index-build calls at lines 1634 ff, ie, for tuples being moved as part
> of a chain). Once in a while, the obsolete pointer will be pointing at
> the real header of a different tuple --- perhaps even the place where we
> are about to put the copy. This improbable case explains the one
> observed Assert crash in which a copied tuple's HEAP_MOVED_IN bit
> mysteriously got turned off. Reason: it was cleared through the
> old-tuple pointer just after being set via the new-tuple one.
>
> Proof that this is happening can be seen in the core dumps for Alfred's
> index-construction-crash cases: tuple.t_data does not point at the same
> place that the tuple.ip_posid'th page line item points at. This could
> only happen if the page was reshuffled since the tuple pointer was set
> up. The explanation for the Assert crash is a bit of a leap of faith,
> but I feel confident that it's right.
>
> The solution is to do everything we're going to do with the source
> tuple, especially copying it and updating its state, *before* we apply
> vc_vacpage to the destination page. Then we don't care if the source
> gets moved during vc_vacpage.
>
> I will prepare a patch along this line and send it to Alfred for
> testing.
>
> regards, tom lane
>
>

Marc G. Fournier ICQ#7615664 IRC Nick: Scrappy
Systems Administrator @ hub.org
primary: scrappy(at)hub(dot)org secondary: scrappy(at){freebsd|postgresql}.org

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Tom Lane 2000-10-17 00:56:17 Re: Re: New relkind for views
Previous Message Mark Hollomon 2000-10-17 00:53:01 Re: Re: New relkind for views