From: | Yura Sokolov <y(dot)sokolov(at)postgrespro(dot)ru> |
---|---|
To: | Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com> |
Cc: | Andres Freund <andres(at)anarazel(dot)de>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org> |
Subject: | Re: [PoC] Improve dead tuple storage for lazy vacuum |
Date: | 2021-07-29 11:03:18 |
Message-ID: | 31974515f7fb873252e9474e54e44d49@postgrespro.ru |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
Masahiko Sawada писал 2021-07-29 12:11:
> On Thu, Jul 29, 2021 at 3:53 AM Andres Freund <andres(at)anarazel(dot)de>
> wrote:
>>
>> Hi,
>>
>> On 2021-07-27 13:06:56 +0900, Masahiko Sawada wrote:
>> > Apart from performance and memory usage points of view, we also need
>> > to consider the reusability of the code. When I started this thread, I
>> > thought the best data structure would be the one optimized for
>> > vacuum's dead tuple storage. However, if we can use a data structure
>> > that can also be used in general, we can use it also for other
>> > purposes. Moreover, if it's too optimized for the current TID system
>> > (32 bits block number, 16 bits offset number, maximum block/offset
>> > number, etc.) it may become a blocker for future changes.
>>
>> Indeed.
>>
>>
>> > In that sense, radix tree also seems good since it can also be used in
>> > gist vacuum as a replacement for intset, or a replacement for hash
>> > table for shared buffer as discussed before. Are there any other use
>> > cases?
>>
>> Yes, I think there are. Whenever there is some spatial locality it has
>> a
>> decent chance of winning over a hash table, and it will most of the
>> time
>> win over ordered datastructures like rbtrees (which perform very
>> poorly
>> due to the number of branches and pointer dispatches). There's plenty
>> hashtables, e.g. for caches, locks, etc, in PG that have a medium-high
>> degree of locality, so I'd expect a few potential uses. When adding
>> "tree compression" (i.e. skip inner nodes that have a single incoming
>> &
>> outgoing node) radix trees even can deal quite performantly with
>> variable width keys.
>
> Good point.
>
>>
>> > On the other hand, I’m concerned that radix tree would be an
>> > over-engineering in terms of vacuum's dead tuples storage since the
>> > dead tuple storage is static data and requires only lookup operation,
>> > so if we want to use radix tree as dead tuple storage, I'd like to see
>> > further use cases.
>>
>> I don't think we should rely on the read-only-ness. It seems pretty
>> clear that we'd want parallel dead-tuple scans at a point not too far
>> into the future?
>
> Indeed. Given that the radix tree itself has other use cases, I have
> no concern about using radix tree for vacuum's dead tuples storage. It
> will be better to have one that can be generally used and has some
> optimizations that are helpful also for vacuum's use case, rather than
> having one that is very optimized only for vacuum's use case.
Main portion of svtm that leads to memory saving is compression of many
pages at once (CHUNK). It could be combined with radix as a storage for
pointers to CHUNKs.
For a moment I'm benchmarking IntegerSet replacement based on Trie (HATM
like)
and CHUNK compression, therefore datastructure could be used for gist
vacuum as well.
Since it is generic (allows to index all 64bit) it lacks of trick used
to speedup svtm. Still on 10x test it is faster than radix.
I'll send result later today after all benchmarks complete.
And I'll try then to make mix of radix and CHUNK compression.
> During the performance benchmark, I found some bugs in the radix tree
> implementation.
There is a bug in radix_to_key_off as well:
tid_i |= ItemPointerGetBlockNumber(tid) << shift;
ItemPointerGetBlockNumber returns uint32, therefore result after shift
is uint32 as well.
It leads to lesser memory consumption (and therefore better times) on
10x test, when page number exceed 2^23 (8M). It still produce "correct"
result for test since every page is filled in the same way.
Could you push your fixes for radix, please?
regards,
Yura Sokolov
y(dot)sokolov(at)postgrespro(dot)ru
funny(dot)falcon(at)gmail(dot)com
From | Date | Subject | |
---|---|---|---|
Next Message | Dilip Kumar | 2021-07-29 11:17:09 | Re: [Patch] ALTER SYSTEM READ ONLY |
Previous Message | Dave Cramer | 2021-07-29 11:02:41 | Re: pg_upgrade does not upgrade pg_stat_statements properly |