detoast datum into the given buffer as a optimization.

From: Andy Fan <zhihuifan1213(at)163(dot)com>
To: PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>
Cc: David Rowley <dgrowley(at)gmail(dot)com>,Tomas Vondra <tomas(at)vondra(dot)me>
Subject: detoast datum into the given buffer as a optimization.
Date: 2024-09-18 09:35:56
Message-ID: 87ikutjocj.fsf@163.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers


Hi,

Currently detoast_attr always detoast the data into a palloc-ed memory
and then if user wants the detoast data in a different memory, user has to
copy them, I'm thinking if we could provide a buf as optional argument for
detoast_attr to save such wastage.

current format:

/* ----------
* detoast_attr -
*
* Public entry point to get back a toasted value from compression
* or external storage. The result is always non-extended varlena form.
*
* Note some callers assume that if the input is an EXTERNAL or COMPRESSED
* datum, the result will be a pfree'able chunk.
* ----------
*/
struct varlena *
detoast_attr(struct varlena *attr)

new format:

/* ----------
* detoast_attr -

* ...
*
* Note if caller provides a non-NULL buffer, it is the duty of caller
* to make sure it has enough room for the detoasted format (Usually
* they can use toast_raw_datum_size to get the size) Or else a
* palloced memory under CurrentMemoryContext is used.
*/

struct varlena *
detoast_attr(struct varlena *attr, char *buffer)

There are 2 user cases at least:

1. The shared detoast datum patch at [1], where I want to avoid the
duplicated detoast effort for the same datum, for example:

SELECT f(big_toast_col) FROM t WHERE g(big_toast_col);

Current master detoast it twice now.

In that patch, I want to detoast the datum into a MemoryContext where the
lifespan is same as slot->tts_values[*] rather than CurrentMemoryContext
so that the result can be reused in the different expression. Within the
proposal here, we can detoast the datum into the desired MemoryContext
directly (just allocating the buffer in the desired MemoryContext is OK).

2. make printtup function a bit faster [2]. That patch already removed
some palloc, memcpy effort, but it still have some chances to
optimize further. for example in text_out function, it is still detoast
the datum into a palloc memory and then copy them into a StringInfo.

One of the key point is we can always get the varlena rawsize cheaply
without any real detoast activity in advance, thanks to the existing
varlena design.

If this can be accepted, it would reduce the size of patch [2] at some
extend, and which part was disliked by Thomas (and me..) [3].

What do you think?

[1] https://commitfest.postgresql.org/49/4759/
[2] https://www.postgresql.org/message-id/87wmjzfz0h.fsf%40163.com
[3] https://www.postgresql.org/message-id/6718759c-2dac-48e4-bf18-282de4d82204%40enterprisedb.com

--
Best Regards
Andy Fan

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Peter Smith 2024-09-18 09:52:51 Re: Add contrib/pg_logicalsnapinspect
Previous Message Alvaro Herrera 2024-09-18 09:26:33 Re: Detailed release notes