From: | Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com> |
---|---|
To: | Simon Riggs <simon(at)2ndQuadrant(dot)com> |
Cc: | PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org> |
Subject: | Re: Hot Standby b-tree delete records review |
Date: | 2010-11-09 11:34:56 |
Message-ID: | 4CD931E0.3020607@enterprisedb.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
(cleaning up my inbox, and bumped into this..)
On 22.04.2010 12:31, Simon Riggs wrote:
> On Thu, 2010-04-22 at 12:18 +0300, Heikki Linnakangas wrote:
>> Simon Riggs wrote:
>>> On Thu, 2010-04-22 at 11:56 +0300, Heikki Linnakangas wrote:
>>>
>>>>>>>> If none of the removed heap tuples were present anymore, we currently
>>>>>>>> return InvalidTransactionId, which kills/waits out all read-only
>>>>>>>> queries. But if none of the tuples were present anymore, the read-only
>>>>>>>> queries wouldn't have seen them anyway, so ISTM that we should treat
>>>>>>>> InvalidTransactionId return value as "we don't need to kill anyone".
>>>>>>> That's not the point. The tuples were not themselves the sole focus,
>>>>>> Yes, they were. We're replaying a b-tree deletion record, which removes
>>>>>> pointers to some heap tuples, making them unreachable to any read-only
>>>>>> queries. If any of them still need to be visible to read-only queries,
>>>>>> we have a conflict. But if all of the heap tuples are gone already,
>>>>>> removing the index pointers to them can'ẗ change the situation for any
>>>>>> query. If any of them should've been visible to a query, the damage was
>>>>>> done already by whoever pruned the heap tuples leaving just the
>>>>>> tombstone LP_DEAD item pointers (in the heap) behind.
>>>>> You're missing my point. Those tuples are indicators of what may lie
>>>>> elsewhere in the database, completely unreferenced by this WAL record.
>>>>> Just because these referenced tuples are gone doesn't imply that all
>>>>> tuple versions written by the as yet-unknown-xids are also gone. We
>>>>> can't infer anything about the whole database just from one small group
>>>>> of records.
>>>> Have you got an example of that?
>>>
>>> I don't need one, I have suggested the safe route. In order to infer
>>> anything, and thereby further optimise things, we would need proof that
>>> no cases can exist, which I don't have. Perhaps we can add "yet", not
>>> sure about that either.
>>
>> It's good to be safe rather than sorry, but I'd still like to know
>> because I'm quite surprised by that, and got me worried that I don't
>> understand how hot standby works as well as I thought I did. I thought
>> the point of stopping replay/killing queries at a b-tree deletion record
>> is precisely that it makes some heap tuples invisible to running
>> read-only queries. If it doesn't make any tuples invisible, why do any
>> queries need to be killed? And why was it OK for them to be running just
>> before replaying the b-tree deletion record?
>
> I'm sorry but I'm too busy to talk further on this today. Since we are
> discussing a further optimisation rather than a bug, I hope it is OK to
> come back to this again later.
Would now be a good time to revisit this? I still don't see why a b-tree
deletion record should conflict with anything, if all the removed index
tuples point to just LP_DEAD tombstones in the heap.
--
Heikki Linnakangas
EnterpriseDB http://www.enterprisedb.com
From | Date | Subject | |
---|---|---|---|
Next Message | Dave Page | 2010-11-09 11:54:04 | Re: improved parallel make support |
Previous Message | Itagaki Takahiro | 2010-11-09 11:34:51 | Re: security hooks on object creation |