From: | Claudio Freire <klaussfreire(at)gmail(dot)com> |
---|---|
To: | Bruce Momjian <bruce(at)momjian(dot)us> |
Cc: | Pavan Deolasee <pavan(dot)deolasee(at)gmail(dot)com>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org> |
Subject: | Re: Heap WARM Tuples - Design Draft |
Date: | 2016-08-04 18:23:10 |
Message-ID: | CAGTBQpZkYKO608bJPMgBkr75ujQvZTJxh4DdBXSJHZGSHZPr8g@mail.gmail.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
On Thu, Aug 4, 2016 at 3:07 PM, Bruce Momjian <bruce(at)momjian(dot)us> wrote:
> On Thu, Aug 4, 2016 at 02:35:32PM -0300, Claudio Freire wrote:
>> > Vacuum will need to be taught not to break the invariant, but I
>> > believe this is viable and efficient.
>>
>>
>> And I didn't mention the invariant.
>>
>> The invariant would be that all WARM chains:
>>
>> * contain entire HOT chains (ie: no item pointers pointing to the
>> middle of a HOT chain)
>
> You mean no _index_ item pointers, right?
Yes
>> * start at the item pointer, and end when the t_ctid of the heap
>> tuple either points to another page, or a tuple with a different index
>> key
>
> Uh, there is only one visible tuple in a HOT chain, so I don't see the
> different index key as being a significant check.
For a HOT chain, sure. That's why I mentioned it as an optimization.
But t_ctid, I checked, unless I read the code wrong, is set for
non-HOT updates too.
So following t_ctid regardless of the HOT-update flags should yield
update chains that can walk several buffers, and that's in fact the
mechanism I'm proposing, only just stop following the chain when it
jumps buffers.
> That is, I think we
> should check the visibility first, as we do now, and then, for the
> only-visible tuple, check the key.
No, because, as I explained in the example, doing that will result in
duplicates being returned.
The whole update chain can span multiple WARM chains, and each WARM
chain can span multiple HOT chains.
Consider, assuming there's an index on id and another on val:
INSERT INTO TABLE t (id,val,dat) VALUES (32,'a','blabla');
UPDATE t SET dat='blabla2' where id = 32;
UPDATE t SET dat='blabla3', id=33 where id = 32;
UPDATE t SET val='b' where id = 33;
UPDATE t SET dat='blabla4' where id = 33;
UPDATE t SET val='a' where id = 33;
This produces a case similar to the one I described. Assuming all
updates happen on the same page, the first will be HOT, so no key
check is needed. The second cannot be HOT, because it updates the id.
But it can be WARM, if it's done on the same page, so it can avoid
updating the index on val. The third will need a new index item
pointer, because it has a new key. The t_ctid chain will walk all the
versions of the row, but the one with 'blabla2' will not have the
HOT-updated bit set. So regular HOT chain walking will stop there, I
propose to not stop there, so the index on val can follow the chain
into blabla3 as if it were a HOT update. Correctness is achieved only
if there is no index item pointer on that index pointing to that index
version, and since there is no efficient way of checking, the
invariants I propose aim to guarantee it so it can be assumed. Since
WARM updates don't create new index item pointers, what needs to be
checked is whether updating from version 2 to version 3 would be a
WARM update on this index or not, and that equals to checking the
index keys for equality.
> I don't see a value in (expensive)
> checking the key of an invisible tuple in hopes of stoping the HOT chain
> traversal.
Not HOT chain, the point is to guarantee correctness by properly
identifying the end of the WARM (not HOT) chain, which is left
implicit.
> Also, what happens when a tuple chain goes off-page, then returns to the
> original page?
The WARM chain ends on the page hop, and when it returns it's a new
WARM chain. So traversal would not follow t_ctid pointers that hop
pages, which makes it efficient, and also guarantees correctness
(since if the is a page hop, we know the index will contain an item
pointer to the version in that other page, so we'll visit it when we
get there).
> That is a new HOT chain given our current code, but
> would the new tuple addition realize it needs to create a new index
> tuple? The new tuple code needs to check the index's root HOT tid for a
> non-match.
I'm not proposing to change how HOT works, but adding another very
similar mechanism on top of it called WARM.
So HOT will keep working like that, HOT pruning will happen as usual,
but we'll have the concept (immaterialized in the physical
representation of the data, just implicit) of WARM chains. WARM chains
would span one or several HOT chains, so they're "bigger".
>> This is maintained by:
>>
>> * No item pointer will be created for row versions whose immediately
>> previous version is already on the index with the same key
>
> You really only have the ctid HOT head stored in the index, not the
> immediate ctid.
I know. I just mentioned it just in case there was a transient time
during cleanup when it isn't true, but thinking it a little bit more,
if it weren't, HOT would also cause duplicates during index scans, so
it's probably not the case (or protected with a lock).
From | Date | Subject | |
---|---|---|---|
Next Message | Tom Lane | 2016-08-04 18:25:52 | Re: handling unconvertible error messages |
Previous Message | Tom Lane | 2016-08-04 18:21:01 | Re: improved DefElem list processing |