Re: Shared detoast Datum proposal

From: Nikita Malakhov <hukutoc(at)gmail(dot)com>
To: Andy Fan <zhihuifan1213(at)163(dot)com>
Cc: Tomas Vondra <tomas(dot)vondra(at)enterprisedb(dot)com>, Michael Zhilin <m(dot)zhilin(at)postgrespro(dot)ru>, Peter Smith <smithpb2250(at)gmail(dot)com>, vignesh C <vignesh21(at)gmail(dot)com>, Matthias van de Meent <boekewurm+postgres(at)gmail(dot)com>, pgsql-hackers(at)lists(dot)postgresql(dot)org
Subject: Re: Shared detoast Datum proposal
Date: 2024-03-05 09:03:35
Message-ID: CAN-LCVMpvTaWDO6RSKzS4D_-PVTnezRcVgP2O_ciUQiMuMFA1A@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Hi,

Tomas, sorry for confusion, in my previous message I meant exactly
the same approach you've posted above, and came up with almost
the same implementation.

Thank you very much for your attention to this thread!

I've asked Andy about this approach because of the same reasons you
mentioned - it keeps cache code small, localized and easy to maintain.

The question that worries me is the memory limit, and I think that cache
lookup and entry invalidation should be done in toast_tuple_externalize
code, for the case the value has been detoasted previously is updated
in the same query, like
UPDATE test SET value = value || '...';

I've added cache entry invalidation and data removal on delete and update
of the toasted values and am currently experimenting with large values.

On Tue, Mar 5, 2024 at 5:59 AM Andy Fan <zhihuifan1213(at)163(dot)com> wrote:

>
> >
> >> 2. more likely to use up all the memory which is allowed. for example:
> >> if we set the limit to 1MB, then we kept more data which will be not
> >> used and then consuming all of the 1MB.
> >>
> >> My method is resolving this with some helps from other modules (kind of
> >> invasive) but can control the eviction well and use the memory as less
> >> as possible.
> >>
> >
> > Is the memory usage really an issue? Sure, it'd be nice to evict entries
> > as soon as we can prove they are not needed anymore, but if the cache
> > limit is set to 1MB it's not really a problem to use 1MB.
>
> This might be a key point which leads us to some different directions, so
> I want to explain more about this, to see if we can get some agreement
> here.
>
> It is a bit hard to decide which memory limit to set, 1MB, 10MB or 40MB,
> 100MB. In my current case it is 40MB at least. Less memory limit
> causes cache ineffecitve, high memory limit cause potential memory
> use-up issue in the TOAST cache design. But in my method, even we set a
> higher value, it just limits the user case it really (nearly) needed,
> and it would not cache more values util the limit is hit. This would
> make a noticable difference when we want to set a high limit and we have
> some high active sessions, like 100 * 40MB = 4GB.
>
> > On 3/4/24 18:08, Andy Fan wrote:
> >> ...
> >>>
> >>>> I assumed that releasing all of the memory at the end of executor once
> >>>> is not an option since it may consumed too many memory. Then, when and
> >>>> which entry to release becomes a trouble for me. For example:
> >>>>
> >>>> QUERY PLAN
> >>>> ------------------------------
> >>>> Nested Loop
> >>>> Join Filter: (t1.a = t2.a)
> >>>> -> Seq Scan on t1
> >>>> -> Seq Scan on t2
> >>>> (4 rows)
> >>>>
> >>>> In this case t1.a needs a longer lifespan than t2.a since it is
> >>>> in outer relation. Without the help from slot's life-cycle system, I
> >>>> can't think out a answer for the above question.
> >>>>
> >>>
> >>> This is true, but how likely such plans are? I mean, surely no one
> would
> >>> do nested loop with sequential scans on reasonably large tables, so how
> >>> representative this example is?
> >>
> >> Acutally this is a simplest Join case, we still have same problem like
> >> Nested Loop + Index Scan which will be pretty common.
> >>
> >
> > Yes, I understand there are cases where LRU eviction may not be the best
> > choice - like here, where the "t1" should stay in the case. But there
> > are also cases where this is the wrong choice, and LRU would be better.
> >
> > For example a couple paragraphs down you suggest to enforce the memory
> > limit by disabling detoasting if the memory limit is reached. That means
> > the detoasting can get disabled because there's a single access to the
> > attribute somewhere "up the plan tree". But what if the other attributes
> > (which now won't be detoasted) are accessed many times until then?
>
> I am not sure I can't follow up here, but I want to explain more about
> the disable-detoast-sharing logic when the memory limit is hit. When
> this happen, the detoast sharing is disabled, but since the detoast
> datum will be released every soon when the slot->tts_values[*] is
> discard, then the 'disable' turn to 'enable' quickly. So It is not
> true that once it is get disabled, it can't be enabled any more for the
> given query.
>
> > I think LRU is a pretty good "default" algorithm if we don't have a very
> > good idea of the exact life span of the values, etc. Which is why other
> > nodes (e.g. Memoize) use LRU too.
>
> > But I wonder if there's a way to count how many times an attribute is
> > accessed (or is likely to be). That might be used to inform a better
> > eviction strategy.
>
> Yes, but in current issue we can get a better esitimation with the help
> of plan shape and Memoize depends on some planner information as well.
> If we bypass the planner information and try to resolve it at the
> cache level, the code may become to complex as well and all the cost is
> run time overhead while the other way is a planning timie overhead.
>
> > Also, we don't need to evict the whole entry - we could evict just the
> > data part (guaranteed to be fairly large), but keep the header, and keep
> > the counts, expected number of hits, and other info. And use this to
> > e.g. release entries that reached the expected number of hits. But I'd
> > probably start with LRU and only do this as an improvement later.
>
> A great lession learnt here, thanks for sharing this!
>
> As for the current user case what I want to highlight is in the current
> user case, we are "caching" "user data" "locally".
>
> USER DATA indicates it might be very huge, it is not common to have a
> 1M tables, but it is much common we have 1M Tuples in one scan, so
> keeping the header might extra memory usage as well, like 10M * 24 bytes
> = 240MB. LOCALLY means it is not friend to multi active sessions. CACHE
> indicates it is hard to evict correctly. My method also have the USER
> DATA, LOCALLY attributes, but it would be better at eviction. eviction
> then have lead to memory usage issue which is discribed at the beginning
> of this writing.
>
> >>> Also, this leads me to the need of having some sort of memory limit. If
> >>> we may be keeping entries for extended periods of time, and we don't
> >>> have any way to limit the amount of memory, that does not seem great.
> >>>
> >>> AFAIK if we detoast everything into tts_values[] there's no way to
> >>> implement and enforce such limit. What happens if there's a row with
> >>> multiple large-ish TOAST values? What happens if those rows are in
> >>> different (and distant) parts of the plan?
> >>
> >> I think this can be done by tracking the memory usage on EState level
> >> or global variable level and disable it if it exceeds the limits and
> >> resume it when we free the detoast datum when we don't need it. I think
> >> no other changes need to be done.
> >>
> >
> > That seems like a fair amount of additional complexity. And what if the
> > toasted values are accessed in context without EState (I haven't checked
> > how common / important that is)?
> >
> > And I'm not sure just disabling the detoast as a way to enforce a memory
> > limit, as explained earlier.
> >
> >>> It seems far easier to limit the memory with the toast cache.
> >>
> >> I think the memory limit and entry eviction is the key issue now. IMO,
> >> there are still some difference when both methods can support memory
> >> limit. The reason is my patch can grantee the cached memory will be
> >> reused, so if we set the limit to 10MB, we know all the 10MB is
> >> useful, but the TOAST cache method, probably can't grantee that, then
> >> when we want to make it effecitvely, we have to set a higher limit for
> >> this.
> >>
> >
> > Can it actually guarantee that? It can guarantee the slot may be used,
> > but I don't see how could it guarantee the detoasted value will be used.
> > We may be keeping the slot for other reasons. And even if it could
> > guarantee the detoasted value will be used, does that actually prove
> > it's better to keep that value? What if it's only used once, but it's
> > blocking detoasting of values that will be used 10x that?
> >
> > If we detoast a 10MB value in the outer side of the Nest Loop, what if
> > the inner path has multiple accesses to another 10MB value that now
> > can't be detoasted (as a shared value)?
>
> Grarantee may be wrong word. The difference in my mind are:
> 1. plan shape have better potential to know the user case of datum,
> since we know the plan tree and knows the rows pass to a given node.
> 2. Planning time effort is cheaper than run-time effort.
> 3. eviction in my method is not as important as it is in TOAST cache
> method since it is reset per slot, so usually it doesn't hit limit in
> fact. But as a cache, it does.
> 4. use up to memory limit we set in TOAST cache case.
>
> >>> In any case, my concern is more about having to do this when creating
> >>> the plan at all, the code complexity etc. Not just because it might
> have
> >>> performance impact.
> >>
> >> I think the main trade-off is TOAST cache method is pretty non-invasive
> >> but can't control the eviction well, the impacts includes:
> >> 1. may evicting the datum we want and kept the datum we don't need.
> >
> > This applies to any eviction algorithm, not just LRU. Ultimately what
> > matters is whether we have in the cache the most often used values, i.e.
> > the hit ratio (perhaps in combination with how expensive detoasting that
> > particular entry was).
>
> Correct, just that I am doubtful about design a LOCAL CACHE for USER
> DATA with the reason I described above.
>
> At last, thanks for your attention, really appreciated about it!
>
> --
> Best Regards
> Andy Fan
>
>
>
>

--
Regards,

--
Nikita Malakhov
Postgres Professional
The Russian Postgres Company
https://postgrespro.ru/

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Heikki Linnakangas 2024-03-05 09:05:40 Re: Slow catchup of 2PC (twophase) transactions on replica in LR
Previous Message shveta malik 2024-03-05 08:49:19 Re: Missing LWLock protection in pgstat_reset_replslot()