Re: [PoC] Improve dead tuple storage for lazy vacuum

From: Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com>
To: John Naylor <johncnaylorls(at)gmail(dot)com>
Cc: PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>, Nathan Bossart <nathandbossart(at)gmail(dot)com>, Andres Freund <andres(at)anarazel(dot)de>, Matthias van de Meent <boekewurm+postgres(at)gmail(dot)com>, Yura Sokolov <y(dot)sokolov(at)postgrespro(dot)ru>
Subject: Re: [PoC] Improve dead tuple storage for lazy vacuum
Date: 2023-12-19 05:37:04
Message-ID: CAD21AoAoV=2Fip+1xdb++ezKNEcCSFMY-L1E3tVwdMdKO_ekKA@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Mon, Dec 18, 2023 at 3:41 PM John Naylor <johncnaylorls(at)gmail(dot)com> wrote:
>
> On Fri, Dec 15, 2023 at 3:15 PM Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com> wrote:
> >
> > On Fri, Dec 15, 2023 at 10:30 AM John Naylor <johncnaylorls(at)gmail(dot)com> wrote:
>
> > > parallel_vacuum_init(Relation rel, Relation *indrels, int nindexes,
> > > - int nrequested_workers, int max_items,
> > > - int elevel, BufferAccessStrategy bstrategy)
> > > + int nrequested_workers, int vac_work_mem,
> > > + int max_offset, int elevel,
> > > + BufferAccessStrategy bstrategy)
> > >
> > > It seems very strange to me that this function has to pass the
> > > max_offset. In general, it's been simpler to assume we have a constant
> > > max_offset, but in this case that fact is not helping. Something to
> > > think about for later.
> >
> > max_offset was previously used in old TID encoding in tidstore. Since
> > tidstore has entries for each block, I think we no longer need it.
>
> It's needed now to properly size the allocation of TidStoreIter which
> contains...
>
> +/* Result struct for TidStoreIterateNext */
> +typedef struct TidStoreIterResult
> +{
> + BlockNumber blkno;
> + int num_offsets;
> + OffsetNumber offsets[FLEXIBLE_ARRAY_MEMBER];
> +} TidStoreIterResult;
>
> Maybe we can palloc the offset array to "almost always" big enough,
> with logic to resize if needed? If not too hard, seems worth it to
> avoid churn in the parameter list.

Yes, I was thinking of that.

>
> > > v45-0010:
> > >
> > > Thinking about this some more, I'm not sure we need to do anything
> > > different for the *starting* segment size. (Controlling *max* size
> > > does seem important, however.) For the corner case of m_w_m = 1MB,
> > > it's fine if vacuum quits pruning immediately after (in effect) it
> > > finds the DSA has gone to 2MB. It's not worth bothering with, IMO. If
> > > the memory accounting starts >1MB because we're adding the trivial
> > > size of some struct, let's just stop doing that. The segment
> > > allocations are what we care about.
> >
> > IIUC it's for work_mem, whose the minimum value is 64kB.
> >
> > >
> > > v45-0011:
> > >
> > > + /*
> > > + * max_bytes is forced to be at least 64kB, the current minimum valid
> > > + * value for the work_mem GUC.
> > > + */
> > > + max_bytes = Max(64 * 1024L, max_bytes);
> > >
> > > Why?
> >
> > This is to avoid creating a radix tree within very small memory. The
> > minimum work_mem value is a reasonable lower bound that PostgreSQL
> > uses internally. It's actually copied from tuplesort.c.
>
> There is no explanation for why it should be done like tuplesort.c. Also...
>
> - tree->leaf_ctx = SlabContextCreate(ctx,
> - "radix tree leaves",
> - RT_SLAB_BLOCK_SIZE(sizeof(RT_VALUE_TYPE)),
> - sizeof(RT_VALUE_TYPE));
> + tree->leaf_ctx = SlabContextCreate(ctx,
> + "radix tree leaves",
> + Min(RT_SLAB_BLOCK_SIZE(sizeof(RT_VALUE_TYPE)),
> + work_mem),
> + sizeof(RT_VALUE_TYPE));
>
> At first, my eyes skipped over this apparent re-indent, but hidden
> inside here is another (undocumented) attempt to clamp the size of
> something. There are too many of these sprinkled in various places,
> and they're already a maintenance hazard -- a different one was left
> behind in v45-0011:
>
> @@ -201,6 +183,7 @@ TidStoreCreate(size_t max_bytes, int max_off,
> dsa_area *area)
> ts->control->max_bytes = max_bytes - (70 * 1024);
> }
>
> Let's do it in just one place. In TidStoreCreate(), do
>
> /* clamp max_bytes to at least the size of the empty tree with
> allocated blocks, so it doesn't immediately appear full */
> ts->control->max_bytes = Max(max_bytes, {rt, shared_rt}_memory_usage);
>
> Then we can get rid of all the worry about 1MB/2MB, 64kB, 70kB -- all that.

But doesn't it mean that even if we create a shared tidstore with
small memory, say 64kB, it actually uses 1MB?

Regards,

--
Masahiko Sawada
Amazon Web Services: https://aws.amazon.com

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Bharath Rupireddy 2023-12-19 05:55:50 Re: Add a perl function in Cluster.pm to generate WAL
Previous Message Dilip Kumar 2023-12-19 05:34:42 Re: SLRU optimization - configurable buffer pool and partitioning the SLRU lock