Quick Links

Re: tableam vs. TOAST

From:	Andres Freund <andres(at)anarazel(dot)de>
To:	Robert Haas <robertmhaas(at)gmail(dot)com>
Cc:	Prabhat Sahu <prabhat(dot)sahu(at)enterprisedb(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject:	Re: tableam vs. TOAST
Date:	2019-08-02 22:42:51
Message-ID:	20190802224251.lsxw4o5ebn2ng5ey@alap3.anarazel.de
Views:	Whole Thread \| Raw Message \| Download mbox \| Resend email
Thread:
Lists:	pgsql-hackers

Hi,

On 2019-08-01 12:23:42 -0400, Robert Haas wrote:
> Roughly, on both master and with the patches, the first one takes
> about 4.2 seconds, the second 7.5, and the third 1.2. The third one
> is the fastest because it doesn't do any compression. Since it does
> less irrelevant work than the other two cases, it has the best chance
> of showing up any performance regression that the patch has caused --
> if any regression existed, I suppose that it would be an increased
> per-toast-fetch or per-toast-chunk overhead. However, I can't
> reproduce any such regression. My first attempt at testing that case
> showed the patch about 1% slower, but that wasn't reliably
> reproducible when I did it a bunch more times. So as far as I can
> figure, this patch does not regress performance in any
> easily-measurable way.

Hm, those all include writing, right? And for read-only we don't expect
any additional overhead, correct? The write overhead is probably too
large show a bit of function call overhead - but if so, it'd probably be
on unlogged tables? And with COPY, because that utilizes multi_insert,
which means more toasting in a shorter amount of time?

.oO(why does everyone attach attachements out of order? Is that
a gmail thing?)

> From a4c858c75793f0f8aff7914c572a6615ea5babf8 Mon Sep 17 00:00:00 2001
> From: Robert Haas <rhaas(at)postgresql(dot)org>
> Date: Mon, 8 Jul 2019 11:58:05 -0400
> Subject: [PATCH 1/4] Split tuptoaster.c into three separate files.
>
> detoast.c/h contain functions required to detoast a datum, partially
> or completely, plus a few other utility functions for examining the
> size of toasted datums.
>
> toast_internals.c/h contain functions that are used internally to the
> TOAST subsystem but which (mostly) do not need to be accessed from
> outside.
>
> heaptoast.c/h contains code that is intrinsically specific to the
> heap AM, either because it operates on HeapTuples or is based on the
> layout of a heap page.
>
> detoast.c and toast_internals.c are placed in
> src/backend/access/common rather than src/backend/access/heap. At
> present, both files still have dependencies on the heap, but that will
> be improved in a future commit.

I wonder if toasting.c should be moved too?

If anybody doesn't know git's --color-moved, it makes patches like this
a lot easier to review.

> index 00000000000..582af147ea1
> --- /dev/null
> +++ b/src/include/access/detoast.h
> @@ -0,0 +1,92 @@
> +/*-------------------------------------------------------------------------
> + *
> + * detoast.h
> + * Access to compressed and external varlena values.
>
> Hm. Does it matter that that also includes stuff like expanded datums?
>
> + * Copyright (c) 2000-2019, PostgreSQL Global Development Group
> + *
> + * src/include/access/detoast.h
> + *
> + *-------------------------------------------------------------------------
> + */
> +#ifndef DETOAST_H
> +#define DETOAST_H

trailing whitespace after "#ifndef DETOAST_H ".

> commit 60d51e6510c66f79c51e43fe22730fe050d87854
> Author: Robert Haas <rhaas(at)postgresql(dot)org>
> Date: 2019-07-08 12:02:16 -0400
>
> Create an API for inserting and deleting rows in TOAST tables.
>
> This moves much of the non-heap-specific logic from toast_delete and
> toast_insert_or_update into a helper functions accessible via a new
> header, toast_helper.h. Using the functions in this module, a table
> AM can implement creation and deletion of TOAST table rows with
> much less code duplication than was possible heretofore. Some
> table AMs won't want to use the TOAST logic at all, but for those
> that do this will make that easier.
>
> Discussion: http://postgr.es/m/CA+TgmoZv-=2iWM4jcw5ZhJeL18HF96+W1yJeYrnGMYdkFFnEpQ@mail.gmail.com

Hm. This leaves toast_insert_or_update() as a name. That makes it sound
like it's generic toast code, rather than heap specific?

It's definitely nice how a lot of repetitive code has been deduplicated.
Also makes it easier to see how algorithmically expensive
toast_insert_or_update() is :(.

> /*
> * Second we look for attributes of attstorage 'x' or 'e' that are still
> * inline, and make them external. But skip this if there's no toast
> * table to push them to.
> */
> while (heap_compute_data_size(tupleDesc,
> toast_values, toast_isnull) > maxDataLen &&
> rel->rd_rel->reltoastrelid != InvalidOid)

Shouldn't this condition be the other way round?

> if (for_compression)
> skip_colflags |= TOASTCOL_INCOMPRESSIBLE;
>
> for (i = 0; i < numAttrs; i++)
> {
> Form_pg_attribute att = TupleDescAttr(tupleDesc, i);
>
> if ((ttc->ttc_attr[i].tai_colflags & skip_colflags) != 0)
> continue;
> if (VARATT_IS_EXTERNAL(DatumGetPointer(ttc->ttc_values[i])))
> continue; /* can't happen, toast_action would be 'p' */
> if (for_compression &&
> VARATT_IS_COMPRESSED(DatumGetPointer(ttc->ttc_values[i])))
> continue;
> if (check_main && att->attstorage != 'm')
> continue;
> if (!check_main && att->attstorage != 'x' && att->attstorage != 'e')
> continue;
>
> if (ttc->ttc_attr[i].tai_size > biggest_size)
> {
> biggest_attno = i;
> biggest_size = ttc->ttc_attr[i].tai_size;
> }
> }

Couldn't most of these be handled via colflags, instead of having
numerous individual checks? I.e. if you had TOASTCOL_COMPRESSED,
TOASTCOL_IGNORED, TOASTCOL_MAIN, TOASTFLAG_EXTERNAL, etc, all but the
size check ought to boil down to a single mask test?

> extern void toast_tuple_init(ToastTupleContext *ttc);
> extern int toast_tuple_find_biggest_attribute(ToastTupleContext *ttc,
> bool for_compression,
> bool check_main);
> extern void toast_tuple_try_compression(ToastTupleContext *ttc, int attribute);
> extern void toast_tuple_externalize(ToastTupleContext *ttc, int attribute,
> int options, int max_chunk_size);
> extern void toast_tuple_cleanup(ToastTupleContext *ttc, bool cleanup_toastrel);

I wonder if a better prefix wouldn't be toasting_...

> +/*
> + * Information about one column of a tuple being toasted.
> + *
> + * NOTE: toast_action[i] can have these values:
> + * ' ' default handling
> + * 'p' already processed --- don't touch it
> + * 'x' incompressible, but OK to move off
> + *
> + * NOTE: toast_attr[i].tai_size is only made valid for varlena attributes with
> + * toast_action[i] different from 'p'.
> + */
> +typedef struct
> +{
> + struct varlena *tai_oldexternal;
> + int32 tai_size;
> + uint8 tai_colflags;
> +} ToastAttrInfo;

I think that comment is outdated?

> +/*
> + * Flags indicating the overall state of a TOAST operation.
> + *
> + * TOAST_NEEDS_DELETE_OLD indicates that one or more old TOAST datums need
> + * to be deleted.
> + *
> + * TOAST_NEEDS_FREE indicates that one or more TOAST values need to be freed.
> + *
> + * TOAST_HAS_NULLS indicates that nulls were found in the tuple being toasted.
> + *
> + * TOAST_NEEDS_CHANGE indicates that a new tuple needs to built; in other
> + * words, the toaster did something.
> + */
> +#define TOAST_NEEDS_DELETE_OLD 0x0001
> +#define TOAST_NEEDS_FREE 0x0002
> +#define TOAST_HAS_NULLS 0x0004
> +#define TOAST_NEEDS_CHANGE 0x0008

I'd make these enums. They're more often accessible in a debugger...

> commit 9e4bd383a00e6bb96088666e57673b343049345c
> Author: Robert Haas <rhaas(at)postgresql(dot)org>
> Date: 2019-08-01 10:37:02 -0400
>
> Allow TOAST tables to be implemented using table AMs other than heap.
>
> toast_fetch_datum, toast_save_datum, and toast_delete_datum are
> adjusted to use tableam rather than heap-specific functions. This
> might have some performance impact, but this patch attempts to
> mitigate that by restructuring things so that we don't open and close
> the toast table and indexes multiple times per tuple.
>
> tableam now exposes an integer value (not a callback) for the
> maximum TOAST chunk size, and has a new callback allowing table
> AMs to specify the AM that should be used to implement the TOAST
> table. Previously, the toast AM was always the same as the table AM.
>
> Patch by me, tested by Prabhat Sabu.
>
> Discussion: http://postgr.es/m/CA+TgmoZv-=2iWM4jcw5ZhJeL18HF96+W1yJeYrnGMYdkFFnEpQ@mail.gmail.com

I'm quite unconvinced that making the chunk size specified by the AM is
a good idea to do in isolation. We have TOAST_MAX_CHUNK_SIZE in
pg_control etc. It seems a bit dangerous to let AMs provide the size,
without being very clear that any change of the value will make data
inaccessible. It'd be different if the max were only used during
toasting.

I think the *size* checks should be weakened so we check:
1) After each chunk, whether the already assembled chunks exceed the
expected size.
2) After all chunks have been collected, check that the size is exactly
what we expect.

I don't think that removes a meaningful amount of error
checking. Missing tuples etc get detected by the chunk_ids not being
consecutive. The overall size is still verified.

The obvious problem with this is the slice fetching logic. For slices
with an offset of 0, it's obviously trivial to implement. For the higher
slice logic, I'd assume we'd have to fetch the first slice by estimating
where the start chunk is based on the current suggest chunk size, and
restarting the scan earlier/later if not accurate. In most cases it'll
be accurate, so we'd not loose efficiency.

Greetings,

Andres Freund

In response to

Re: tableam vs. TOAST at 2019-08-01 16:23:42 from Robert Haas

Responses

Re: tableam vs. TOAST at 2019-08-05 19:36:59 from Robert Haas
Re: tableam vs. TOAST at 2019-09-12 13:03:43 from Robert Haas

Browse pgsql-hackers by date

	From	Date	Subject
Next Message	Tom Lane	2019-08-02 22:47:07	Re: [PATCH] Stop ALTER SYSTEM from making bad assumptions
Previous Message	Peter Geoghegan	2019-08-02 22:42:29	Re: The unused_oids script should have a reminder to use the 8000-8999 OID range