Re: Shared detoast Datum proposal

From: Andy Fan <zhihuifan1213(at)163(dot)com>
To: Robert Haas <robertmhaas(at)gmail(dot)com>
Cc: Nikita Malakhov <hukutoc(at)gmail(dot)com>, Tomas Vondra <tomas(dot)vondra(at)enterprisedb(dot)com>, Michael Zhilin <m(dot)zhilin(at)postgrespro(dot)ru>, Peter Smith <smithpb2250(at)gmail(dot)com>, vignesh C <vignesh21(at)gmail(dot)com>, Matthias van de Meent <boekewurm+postgres(at)gmail(dot)com>, pgsql-hackers(at)lists(dot)postgresql(dot)org, Andres Freund <andres(at)anarazel(dot)de>
Subject: Re: Shared detoast Datum proposal
Date: 2024-06-22 03:10:41
Message-ID: 871q4pn03y.fsf@163.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers


Hi,

Let's me clarify the current state of this patch and what kind of help
is needed. You can check [1] for what kind of problem it try to solve
and what is the current approach.

The current blocker is if the approach is safe (potential to memory leak
crash etc.). And I want to have some agreement on this step at least.

"""
When we access [some desired] toast column in EEOP_{SCAN|INNER_OUTER}_VAR
steps, we can detoast it immediately and save it back to
slot->tts_values. so the later ExprState can use the same detoast version
all the time. this should be more the most effective and simplest way.
"""

IMO, it doesn't break any design of TupleTableSlot and should be
safe. Here is my understanding of TupleTableSlot, please correct me if
anything I missed.

(a). TupleTableSlot define tts_values[*] / tts_isnulls[*] at the high level.
When the upper layer want a attnum, it just access tts_values/nulls[n].

(b). TupleTableSlot has many different variants, depends on how to get
tts_values[*] from them, like VirtualTupleTableSlot, HeapTupleTableSlot,
how to manage its resource (mainly memory) when the life cycle is
done, for example BufferHeapTupleTableSlot.

(c). TupleTableSlot defines a lot operation against on that, like
getsomeattrs, copy, clear and so on, for the different variants.

the impacts of my proposal on the above factor:

(a). it doesn't break it clearly,
(b). it adds a 'detoast' step for step (b) when fill the tts_values[*],
(this is done conditionally, see [1] for the overall design, we
currently focus the safety). What's more, I only added the step in
EEOP_{SCAN|INNER_OUTER}_VAR_TOAST step, it reduce the impact in another
step.
(c). a special MemoryContext in TupleTableSlot is introduced for the
detoast datum, it is reset whenever the TupleTableSlot.tts_nvalid
= 0. and I also double check every operation defined in
TupleTableSlotOps, looks nothing should be specially handled.

Robert has expressed his concern as below and wanted Andres to have a
look at, but unluckly Andres doesn't have such time so far:(

> Robert Haas <robertmhaas(at)gmail(dot)com> writes:
>
>> On Wed, May 22, 2024 at 9:46 PM Andy Fan <zhihuifan1213(at)163(dot)com> wrote:
>>> Please give me one more chance to explain this. What I mean is:
>>>
>>> Take SELECT f(a) FROM t1 join t2...; for example,
>>>
>>> When we read the Datum of a Var, we read it from tts->tts_values[*], no
>>> matter what kind of TupleTableSlot is. So if we can put the "detoast
>>> datum" back to the "original " tts_values, then nothing need to be
>>> changed.
>>
>> Yeah, I don't think you can do that.
>>
>>> - Saving the "detoast datum" version into tts_values[*] doesn't break
>>> the above semantic and the ExprEval engine just get a detoast version
>>> so it doesn't need to detoast it again.
>>
>> I don't think this is true. If it is true, it needs to be extremely
>> well-justified. Even if this seems to work in simple cases, I suspect
>> there will be cases where it breaks badly, resulting in memory leaks
>> or server crashes or some other kind of horrible problem that I can't
>> quite imagine. Unfortunately, my knowledge of this code isn't
>> fantastic, so I can't say exactly what bad thing will happen, and I
>> can't even say with 100% certainty that anything bad will happen, but
>> I bet it will. It seems like it goes against everything I understand
>> about how TupleTableSlots are supposed to be used. (Andres would be
>> the best person to give a definitive answer here.)

I think having a discussion of the design of TupleTableSlots would be
helpful for the progress since in my knowldege it doesn't against
anything of the TupleTaleSlots :(, as I stated above. I'm sure I'm
pretty open on this.

>>
>>> - The keypoint is the memory management and effeiciency. for now I think
>>> all the places where "slot->tts_nvalid" is set to 0 means the
>>> tts_values[*] is no longer validate, then this is the place we should
>>> release the memory for the "detoast datum". All the other places like
>>> ExecMaterializeSlot or ExecCopySlot also need to think about the
>>> "detoast datum" as well.
>>
>> This might be a way of addressing some of the issues with this idea,
>> but I doubt it will be acceptable. I don't think we want to complicate
>> the slot API for this feature. There's too much downside to doing
>> that, in terms of performance and understandability.

Complexity: I double check the code, it has very limitted changes on the
TupleTupleSlot APIs (less than 10 rows). see tuptable.h.

Performance: by design, it just move the chance of detoast action
*conditionly* and put it back to tts_values[*], there is no extra
overhead to find out the detoast version for sharing.

Understandability: based on how do we understand TupleTableSlot:)

[1] https://www.postgresql.org/message-id/87il4jrk1l.fsf%40163.com

--
Best Regards
Andy Fan

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Andy Fan 2024-06-22 04:10:32 Re: cost delay brainstorming
Previous Message Joel Jacobson 2024-06-22 00:54:30 Re: Add pg_get_acl() function get the ACL for a database object