Re: type cache cleanup improvements

From: Alexander Korotkov <aekorotkov(at)gmail(dot)com>
To: Andrei Lepikhov <a(dot)lepikhov(at)postgrespro(dot)ru>
Cc: Teodor Sigaev <teodor(at)sigaev(dot)ru>, Pgsql Hackers <pgsql-hackers(at)postgresql(dot)org>, Aleksander Alekseev <aleksander(at)timescale(dot)com>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Michael Paquier <michael(at)paquier(dot)xyz>
Subject: Re: type cache cleanup improvements
Date: 2024-08-05 01:16:07
Message-ID: CAPpHfdtNU6Lz21_ZtNDg_d+WoUvK3Togukp+cMu0cQp7o1w_OA@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Hi!

On Wed, Apr 3, 2024 at 9:07 AM Andrei Lepikhov
<a(dot)lepikhov(at)postgrespro(dot)ru> wrote:
> On 3/15/24 17:57, Teodor Sigaev wrote:
> >> Okay, I've applied this piece for now. Not sure I'll have much room
> >> to look at the rest.
> >
> > Thank you very much!
> I have spent some time reviewing this feature. I think we can discuss
> and apply it step-by-step. So, the 0001-* patch is at this moment.
> The feature addresses the issue of TypCache being bloated by intensive
> usage of non-standard types and domains. It adds significant overhead
> during relcache invalidation by thoroughly scanning this hash table.
> IMO, this feature will be handy soon, as we already see some patches
> where TypCache is intensively used for storing composite types—for
> example, look into solutions proposed in [1].
> One of my main concerns with this feature is the possibility of lost
> entries, which could be mistakenly used by relations with the same oid
> in the future. This seems particularly possible in cases with multiple
> temporary tables. The author has attempted to address this by replacing
> the typrelid and type_id fields in the mapRelType on each call of
> lookup_type_cache. However, I believe we could further improve this by
> removing the entry from mapRelType on invalidation, thus avoiding this
> potential issue.
> While reviewing the patch, I made some minor changes (see attachment)
> that you're free to adopt or reject. However, it's crucial that the
> patch includes a detailed explanation, not just a single sentence, to
> ensure everyone understands the changes.
> Upon closer inspection, I noticed that the current implementation only
> invalidates the cache entry. While this is acceptable for standard
> types, it may not be sufficient to maintain numerous custom types (as in
> the example in the initial letter) or in cases where whole-row vars are
> heavily used. In such scenarios, removing the entry and reducing the
> hash table's size might be more efficient.
> In toto, the 0001-* patch looks good, and I would be glad to see it in
> the core.

I've revised the patchset. First of all, I've re-ordered the patches.

0001-0002 (former 0002-0003)
Comprises hash_search_with_hash_value() function and its application
to avoid full hash iteration in InvalidateAttoptCacheCallback() and
TypeCacheTypCallback(). I think this is quite straightforward
optimization without negative side effects. I've revised comments,
commit message and did some code beautification. I'm going to push
this if no objections.

0003 (former 0001)
I've revised this patch. I think main concerns expressed in the
thread about this path is that we don't have invalidation mechanism
for relid => typid map. Finally due to oid wraparound same relids
could get reused. That could lead to invalid entries in the map about
existing relids and typeids. This is rather messy, but I don't think
this could cause a material bug. The maps items are used only for
cache invalidation. Extra invalidation doesn't cause a bug. If type
with same relid will be cached, then correspoding map item will be
overridden, so no missing invalidation. However, I see the following
reasons for keeping consistent state of relid => typid map.

1) As the main use-case for this optimization is flood of temporary
tables, it would be nice not let relid => typid map bloat in this
case. I see that TypeCacheHash would get bloated, because its entries
are never deleted. However, I would prefer to not get this situation
even worse.
2) In future we may find some more use-cases for relid => typid map
besides cache invalidation. Keeping that in consistent state could be
advantage then.

In the attached patch, I'm keeping relid => typid map when
corresponding typentry have either TCFLAGS_HAVE_PG_TYPE_DATA, or
TCFLAGS_OPERATOR_FLAGS, or tupdesc. Thus, when temporary table gets
deleted, we would invalidate the map item.

It will be also nice to get rid of iteration over all the cached
domain types in TypeCacheRelCallback(). However, this typically
shouldn't be a problem since domain types are less tended to bloat.
Domain types are created manually, unlike composite types which are
automatically created for every temporary table. We will probably
need to optimize this in future, but I don't feel this to be necessary
in present patch.

I think the revised 0003 requires review.

------
Regards,
Alexander Korotkov
Supabase

Attachment Content-Type Size
v7-0001-Introduce-hash_search_with_hash_value-function.patch application/octet-stream 3.3 KB
v7-0002-Optimize-InvalidateAttoptCacheCallback-and-TypeCa.patch application/octet-stream 6.7 KB
v7-0003-Avoid-looping-over-all-type-cache-entries-in-Type.patch application/octet-stream 14.7 KB

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Senglee Choi 2024-08-05 01:23:42 Fixed small typo in bufpage.h
Previous Message Joseph Koshakow 2024-08-04 23:55:12 Re: Remove dependence on integer wrapping