Quick Links

Re: Shared detoast Datum proposal

From:	Andy Fan <zhihuifan1213(at)163(dot)com>
To:	Tomas Vondra <tomas(dot)vondra(at)enterprisedb(dot)com>
Cc:	Nikita Malakhov <hukutoc(at)gmail(dot)com>, Michael Zhilin <m(dot)zhilin(at)postgrespro(dot)ru>, Peter Smith <smithpb2250(at)gmail(dot)com>, vignesh C <vignesh21(at)gmail(dot)com>, Matthias van de Meent <boekewurm+postgres(at)gmail(dot)com>, pgsql-hackers(at)lists(dot)postgresql(dot)org
Subject:	Re: Shared detoast Datum proposal
Date:	2024-03-04 17:08:29
Message-ID:	87y1axc1o1.fsf@163.com
Views:	Whole Thread \| Raw Message \| Download mbox \| Resend email
Thread:
Lists:	pgsql-hackers

>
> I decided to whip up a PoC patch implementing this approach, to get a
> better idea of how it compares to the original proposal, both in terms
> of performance and code complexity.
>
> Attached are two patches that both add a simple cache in detoast.c, but
> differ in when exactly the caching happens.
>
> toast-cache-v1 - caching happens in toast_fetch_datum, which means this
> happens before decompression
>
> toast-cache-v2 - caching happens in detoast_attr, after decompression
...
>
> I think implementation-wise this is pretty non-invasive.
..
>
> Performance-wise, these patches are slower than with Andy's patch. For
> example for the jsonb Q1, I see master ~500ms, Andy's patch ~100ms and
> v2 at ~150ms. I'm sure there's a number of optimization opportunities
> and v2 could get v2 closer to the 100ms.
>
> v1 is not competitive at all in this jsonb/Q1 test - it's just as slow
> as master, because the data set is small enough to be fully cached, so
> there's no I/O and it's the decompression is the actual bottleneck.
>
> That being said, I'm not convinced v1 would be a bad choice for some
> cases. If there's memory pressure enough to evict TOAST, it's quite
> possible the I/O would become the bottleneck. At which point it might be
> a good trade off to cache compressed data, because then we'd cache more
> detoasted values.
>
> OTOH it's likely the access to TOAST values is localized (in temporal
> sense), i.e. we read it from disk, detoast it a couple times in a short
> time interval, and then move to the next row. That'd mean v2 is probably
> the correct trade off.

I don't have different thought about the above statement and Thanks for
the PoC code which would make later testing much easier.

>>
>> """
>> A alternative design: toast cache
>> ---------------------------------
>>
>> This method is provided by Tomas during the review process. IIUC, this
>> method would maintain a local HTAB which map a toast datum to a detoast
>> datum and the entry is maintained / used in detoast_attr
>> function. Within this method, the overall design is pretty clear and the
>> code modification can be controlled in toasting system only.
>>
>
> Right.
>
>> I assumed that releasing all of the memory at the end of executor once
>> is not an option since it may consumed too many memory. Then, when and
>> which entry to release becomes a trouble for me. For example:
>>
>> QUERY PLAN
>> ------------------------------
>> Nested Loop
>> Join Filter: (t1.a = t2.a)
>> -> Seq Scan on t1
>> -> Seq Scan on t2
>> (4 rows)
>>
>> In this case t1.a needs a longer lifespan than t2.a since it is
>> in outer relation. Without the help from slot's life-cycle system, I
>> can't think out a answer for the above question.
>>
>
> This is true, but how likely such plans are? I mean, surely no one would
> do nested loop with sequential scans on reasonably large tables, so how
> representative this example is?

Acutally this is a simplest Join case, we still have same problem like
Nested Loop + Index Scan which will be pretty common.

> Also, this leads me to the need of having some sort of memory limit. If
> we may be keeping entries for extended periods of time, and we don't
> have any way to limit the amount of memory, that does not seem great.
>
> AFAIK if we detoast everything into tts_values[] there's no way to
> implement and enforce such limit. What happens if there's a row with
> multiple large-ish TOAST values? What happens if those rows are in
> different (and distant) parts of the plan?

I think this can be done by tracking the memory usage on EState level
or global variable level and disable it if it exceeds the limits and
resume it when we free the detoast datum when we don't need it. I think
no other changes need to be done.

> It seems far easier to limit the memory with the toast cache.

I think the memory limit and entry eviction is the key issue now. IMO,
there are still some difference when both methods can support memory
limit. The reason is my patch can grantee the cached memory will be
reused, so if we set the limit to 10MB, we know all the 10MB is
useful, but the TOAST cache method, probably can't grantee that, then
when we want to make it effecitvely, we have to set a higher limit for
this.

>> Another difference between the 2 methods is my method have many
>> modification on createplan.c/setref.c/execExpr.c/execExprInterp.c, but
>> it can save some run-time effort like hash_search find / enter run-time
>> in method 2 since I put them directly into tts_values[*].
>>
>> I'm not sure the factor 2 makes some real measurable difference in real
>> case, so my current concern mainly comes from factor 1.
>> """
>>
>
> This seems a bit dismissive of the (possible) issue.

Hmm, sorry about this, that is not my intention:(

> It'd be good to do
> some measurements, especially on simple queries that can't benefit from
> the detoasting (e.g. because there's no varlena attribute).

This testing for the queries which have no varlena attribute was done at
the very begining of this thread, and for now, the code is much more
nicer for this sistuation. all the changes in execExpr.c
execExprInterp.c has no impact on this. Only the plan walker codes
matter. Here is a test based on tenk1.

q1: explain select count(*) from tenk1 where odd > 10 and even > 30;
q2: select count(*) from tenk1 where odd > 10 and even > 30;

pgbench -n -f qx.sql regression -T10

| Query | master (ms) | patched (ms) |
|-------+-------------+--------------|
| q1 | 0.129 | 0.129 |
| q2 | 1.876 | 1.870 |

there are some error for patched-q2 combination, but at least it can
show it doesn't cause noticable regression.

> In any case, my concern is more about having to do this when creating
> the plan at all, the code complexity etc. Not just because it might have
> performance impact.

I think the main trade-off is TOAST cache method is pretty non-invasive
but can't control the eviction well, the impacts includes:
1. may evicting the datum we want and kept the datum we don't need.
2. more likely to use up all the memory which is allowed. for example:
if we set the limit to 1MB, then we kept more data which will be not
used and then consuming all of the 1MB.

My method is resolving this with some helps from other modules (kind of
invasive) but can control the eviction well and use the memory as less
as possible.

--
Best Regards
Andy Fan

In response to

Re: Shared detoast Datum proposal at 2024-03-04 16:06:23 from Tomas Vondra

Responses

Re: Shared detoast Datum proposal at 2024-03-04 20:07:04 from Tomas Vondra

Browse pgsql-hackers by date

	From	Date	Subject
Next Message	Andrey M. Borodin	2024-03-04 17:19:57	Re: Tidy fill hstv array (src/backend/access/heap/pruneheap.c)
Previous Message	Tomas Vondra	2024-03-04 17:03:52	Re: a wrong index choose when statistics is out of date