Re: Shared detoast Datum proposal

From: Tomas Vondra <tomas(dot)vondra(at)enterprisedb(dot)com>
To: Andy Fan <zhihuifan1213(at)163(dot)com>
Cc: Nikita Malakhov <hukutoc(at)gmail(dot)com>, Michael Zhilin <m(dot)zhilin(at)postgrespro(dot)ru>, Peter Smith <smithpb2250(at)gmail(dot)com>, vignesh C <vignesh21(at)gmail(dot)com>, Matthias van de Meent <boekewurm+postgres(at)gmail(dot)com>, pgsql-hackers(at)lists(dot)postgresql(dot)org
Subject: Re: Shared detoast Datum proposal
Date: 2024-03-04 20:07:04
Message-ID: 05aca065-f1f1-48af-b0e8-5eaa72ee6b6f@enterprisedb.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On 3/4/24 18:08, Andy Fan wrote:
> ...
>>
>>> I assumed that releasing all of the memory at the end of executor once
>>> is not an option since it may consumed too many memory. Then, when and
>>> which entry to release becomes a trouble for me. For example:
>>>
>>> QUERY PLAN
>>> ------------------------------
>>> Nested Loop
>>> Join Filter: (t1.a = t2.a)
>>> -> Seq Scan on t1
>>> -> Seq Scan on t2
>>> (4 rows)
>>>
>>> In this case t1.a needs a longer lifespan than t2.a since it is
>>> in outer relation. Without the help from slot's life-cycle system, I
>>> can't think out a answer for the above question.
>>>
>>
>> This is true, but how likely such plans are? I mean, surely no one would
>> do nested loop with sequential scans on reasonably large tables, so how
>> representative this example is?
>
> Acutally this is a simplest Join case, we still have same problem like
> Nested Loop + Index Scan which will be pretty common.
>

Yes, I understand there are cases where LRU eviction may not be the best
choice - like here, where the "t1" should stay in the case. But there
are also cases where this is the wrong choice, and LRU would be better.

For example a couple paragraphs down you suggest to enforce the memory
limit by disabling detoasting if the memory limit is reached. That means
the detoasting can get disabled because there's a single access to the
attribute somewhere "up the plan tree". But what if the other attributes
(which now won't be detoasted) are accessed many times until then?

I think LRU is a pretty good "default" algorithm if we don't have a very
good idea of the exact life span of the values, etc. Which is why other
nodes (e.g. Memoize) use LRU too.

But I wonder if there's a way to count how many times an attribute is
accessed (or is likely to be). That might be used to inform a better
eviction strategy.

Also, we don't need to evict the whole entry - we could evict just the
data part (guaranteed to be fairly large), but keep the header, and keep
the counts, expected number of hits, and other info. And use this to
e.g. release entries that reached the expected number of hits. But I'd
probably start with LRU and only do this as an improvement later.

>> Also, this leads me to the need of having some sort of memory limit. If
>> we may be keeping entries for extended periods of time, and we don't
>> have any way to limit the amount of memory, that does not seem great.
>>
>> AFAIK if we detoast everything into tts_values[] there's no way to
>> implement and enforce such limit. What happens if there's a row with
>> multiple large-ish TOAST values? What happens if those rows are in
>> different (and distant) parts of the plan?
>
> I think this can be done by tracking the memory usage on EState level
> or global variable level and disable it if it exceeds the limits and
> resume it when we free the detoast datum when we don't need it. I think
> no other changes need to be done.
>

That seems like a fair amount of additional complexity. And what if the
toasted values are accessed in context without EState (I haven't checked
how common / important that is)?

And I'm not sure just disabling the detoast as a way to enforce a memory
limit, as explained earlier.

>> It seems far easier to limit the memory with the toast cache.
>
> I think the memory limit and entry eviction is the key issue now. IMO,
> there are still some difference when both methods can support memory
> limit. The reason is my patch can grantee the cached memory will be
> reused, so if we set the limit to 10MB, we know all the 10MB is
> useful, but the TOAST cache method, probably can't grantee that, then
> when we want to make it effecitvely, we have to set a higher limit for
> this.
>

Can it actually guarantee that? It can guarantee the slot may be used,
but I don't see how could it guarantee the detoasted value will be used.
We may be keeping the slot for other reasons. And even if it could
guarantee the detoasted value will be used, does that actually prove
it's better to keep that value? What if it's only used once, but it's
blocking detoasting of values that will be used 10x that?

If we detoast a 10MB value in the outer side of the Nest Loop, what if
the inner path has multiple accesses to another 10MB value that now
can't be detoasted (as a shared value)?

>
>>> Another difference between the 2 methods is my method have many
>>> modification on createplan.c/setref.c/execExpr.c/execExprInterp.c, but
>>> it can save some run-time effort like hash_search find / enter run-time
>>> in method 2 since I put them directly into tts_values[*].
>>>
>>> I'm not sure the factor 2 makes some real measurable difference in real
>>> case, so my current concern mainly comes from factor 1.
>>> """
>>>
>>
>> This seems a bit dismissive of the (possible) issue.
>
> Hmm, sorry about this, that is not my intention:(
>

No need to apologize.

>> It'd be good to do
>> some measurements, especially on simple queries that can't benefit from
>> the detoasting (e.g. because there's no varlena attribute).
>
> This testing for the queries which have no varlena attribute was done at
> the very begining of this thread, and for now, the code is much more
> nicer for this sistuation. all the changes in execExpr.c
> execExprInterp.c has no impact on this. Only the plan walker codes
> matter. Here is a test based on tenk1.
>
> q1: explain select count(*) from tenk1 where odd > 10 and even > 30;
> q2: select count(*) from tenk1 where odd > 10 and even > 30;
>
> pgbench -n -f qx.sql regression -T10
>
> | Query | master (ms) | patched (ms) |
> |-------+-------------+--------------|
> | q1 | 0.129 | 0.129 |
> | q2 | 1.876 | 1.870 |
>
> there are some error for patched-q2 combination, but at least it can
> show it doesn't cause noticable regression.
>

OK, I'll try to do some experiments with these cases too.

>> In any case, my concern is more about having to do this when creating
>> the plan at all, the code complexity etc. Not just because it might have
>> performance impact.
>
> I think the main trade-off is TOAST cache method is pretty non-invasive
> but can't control the eviction well, the impacts includes:
> 1. may evicting the datum we want and kept the datum we don't need.

This applies to any eviction algorithm, not just LRU. Ultimately what
matters is whether we have in the cache the most often used values, i.e.
the hit ratio (perhaps in combination with how expensive detoasting that
particular entry was).

> 2. more likely to use up all the memory which is allowed. for example:
> if we set the limit to 1MB, then we kept more data which will be not
> used and then consuming all of the 1MB.
>
> My method is resolving this with some helps from other modules (kind of
> invasive) but can control the eviction well and use the memory as less
> as possible.
>

Is the memory usage really an issue? Sure, it'd be nice to evict entries
as soon as we can prove they are not needed anymore, but if the cache
limit is set to 1MB it's not really a problem to use 1MB.

regards

--
Tomas Vondra
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Daniel Gustafsson 2024-03-04 20:27:23 Re: Improve the log message output of basic_archive when basic_archive.archive_directory parameter is not set
Previous Message Daniel Gustafsson 2024-03-04 20:04:14 Re: Fix a typo in pg_rotate_logfile