From: | Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com> |
---|---|
To: | John Naylor <johncnaylorls(at)gmail(dot)com> |
Cc: | PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>, Nathan Bossart <nathandbossart(at)gmail(dot)com>, Andres Freund <andres(at)anarazel(dot)de>, Matthias van de Meent <boekewurm+postgres(at)gmail(dot)com>, Yura Sokolov <y(dot)sokolov(at)postgrespro(dot)ru> |
Subject: | Re: [PoC] Improve dead tuple storage for lazy vacuum |
Date: | 2024-03-22 05:19:44 |
Message-ID: | CAD21AoCaDT+-ZaVjbtvumms0tyyHPNLELK2UX-MLG9XCgioaNw@mail.gmail.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
On Thu, Mar 21, 2024 at 7:48 PM John Naylor <johncnaylorls(at)gmail(dot)com> wrote:
>
> On Thu, Mar 21, 2024 at 4:03 PM Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com> wrote:
> >
> > I've looked into this idea further. Overall, it looks clean and I
> > don't see any problem so far in terms of integration with lazy vacuum.
> > I've attached three patches for discussion and tests.
>
> Seems okay in the big picture, it's the details we need to be careful of.
>
> v77-0001
>
> - dead_items = (VacDeadItems *) palloc(vac_max_items_to_alloc_size(max_items));
> - dead_items->max_items = max_items;
> - dead_items->num_items = 0;
> + vacrel->dead_items = TidStoreCreate(vac_work_mem, NULL, 0);
> +
> + dead_items_info = (VacDeadItemsInfo *) palloc(sizeof(VacDeadItemsInfo));
> + dead_items_info->max_bytes = vac_work_mem * 1024L;
>
> This is confusing enough that it looks like a bug:
>
> [inside TidStoreCreate()]
> /* choose the maxBlockSize to be no larger than 1/16 of max_bytes */
> while (16 * maxBlockSize > max_bytes * 1024L)
> maxBlockSize >>= 1;
>
> This was copied from CreateWorkExprContext, which operates directly on
> work_mem -- if the parameter is actually bytes, we can't "* 1024"
> here. If we're passing something measured in kilobytes, the parameter
> is badly named. Let's use convert once and use bytes everywhere.
True. The attached 0001 patch fixes it.
>
> v77-0002:
>
> +#define dsa_create(tranch_id) \
> + dsa_create_ext(tranch_id, DSA_INITIAL_SEGMENT_SIZE, DSA_MAX_SEGMENT_SIZE)
>
> Since these macros are now referring to defaults, maybe their name
> should reflect that. Something like DSA_DEFAULT_INIT_SEGMENT_SIZE
> (*_MAX_*)
It makes sense to rename DSA_INITIAL_SEGMENT_SIZE , but I think that
the DSA_MAX_SEGMENT_SIZE is the theoretical maximum size, the current
name also makes sense to me.
>
> +/* The minimum size of a DSM segment. */
> +#define DSA_MIN_SEGMENT_SIZE ((size_t) 1024)
>
> That's a *lot* smaller than it is now. Maybe 256kB? We just want 1MB
> m_w_m to work correctly.
Fixed.
>
> v77-0003:
>
> +/* Public APIs to create local or shared TidStore */
> +
> +TidStore *
> +TidStoreCreateLocal(size_t max_bytes)
> +{
> + return tidstore_create_internal(max_bytes, false, 0);
> +}
> +
> +TidStore *
> +TidStoreCreateShared(size_t max_bytes, int tranche_id)
> +{
> + return tidstore_create_internal(max_bytes, true, tranche_id);
> +}
>
> I don't think these operations have enough in common to justify
> sharing even an internal implementation. Choosing aset block size is
> done for both memory types, but it's pointless to do it for shared
> memory, because the local context is then only used for small
> metadata.
>
> + /*
> + * Choose the DSA initial and max segment sizes to be no longer than
> + * 1/16 and 1/8 of max_bytes, respectively.
> + */
>
> I'm guessing the 1/8 here because the number of segments is limited? I
> know these numbers are somewhat arbitrary, but readers will wonder why
> one has 1/8 and the other has 1/16.
>
> + if (dsa_init_size < DSA_MIN_SEGMENT_SIZE)
> + dsa_init_size = DSA_MIN_SEGMENT_SIZE;
> + if (dsa_max_size < DSA_MAX_SEGMENT_SIZE)
> + dsa_max_size = DSA_MAX_SEGMENT_SIZE;
>
> The second clamp seems against the whole point of this patch -- it
> seems they should all be clamped bigger than the DSA_MIN_SEGMENT_SIZE?
> Did you try it with 1MB m_w_m?
I've incorporated the above comments and test results look good to me.
I've attached the several patches:
- 0002 is a minor fix for tidstore I found.
- 0005 changes the create APIs of tidstore.
- 0006 update the vacuum improvement patch to use the new
TidStoreCreateLocal/Shared() APIs.
Regards,
--
Masahiko Sawada
Amazon Web Services: https://aws.amazon.com
Attachment | Content-Type | Size |
---|---|---|
v78-0005-Rethink-create-and-attach-APIs-of-shared-TidStor.patch | application/octet-stream | 10.1 KB |
v78-0004-Allow-specifying-initial-and-maximum-segment-siz.patch | application/octet-stream | 9.7 KB |
v78-0003-Use-TidStore-for-dead-tuple-TIDs-storage-during-.patch | application/octet-stream | 43.9 KB |
v78-0006-Adjust-the-vacuum-improvement-patch-to-new-TidSt.patch | application/octet-stream | 6.3 KB |
v78-0002-Fix-an-inconsistent-function-prototype-with-the-.patch | application/octet-stream | 1023 bytes |
v78-0001-Fix-a-calculation-in-TidStoreCreate.patch | application/octet-stream | 1.0 KB |
From | Date | Subject | |
---|---|---|---|
Next Message | jian he | 2024-03-22 05:33:37 | Re: SQL:2011 application time |
Previous Message | Amit Kapila | 2024-03-22 05:19:17 | Re: Introduce XID age and inactive timeout based replication slot invalidation |