Re: logical decoding vs. VACUUM FULL / CLUSTER on table with TOAST-ed data

From: Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com>
To: Tomas Vondra <tomas(dot)vondra(at)2ndquadrant(dot)com>
Cc: PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: logical decoding vs. VACUUM FULL / CLUSTER on table with TOAST-ed data
Date: 2018-11-19 09:28:35
Message-ID: CAD21AoB0EfKjgvo3wsT8qhbUpUR=e-B0iqxcv16A-EQujxp0xg@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Mon, Nov 19, 2018 at 6:52 AM Tomas Vondra
<tomas(dot)vondra(at)2ndquadrant(dot)com> wrote:
>
> Hi,
>
> It seems we have pretty annoying problem with logical decoding when
> performing VACUUM FULL / CLUSTER on a table with toast-ed data.
>
> The trouble is that the rewritten heap is WAL-logged using XLOG/FPI
> records, the TOAST data is logged as regular INSERT records. XLOG/FPI is
> ignored in logical decoding, and so reorderbuffer never gets those
> records. But we do decode the TOAST data, and reorderbuffer stashes them
> in toast_hash hash table. Which gets reset only when handling a row from
> the main heap, and that never arrives. So we end up stashing all the
> TOAST data in memory :-(
>
> So do VACUUM FULL (or CLUSTER) on a sufficiently large table, and you're
> likely to break any logical replication connection. And it does not
> matter if you replicate this particular table.
>
> Luckily enough, this can leverage some of the pieces introduced by
> e9edc1ba which was meant to deal with rewrites of system tables, and in
> raw_heap_insert it added this:
>
> /*
> * The new relfilenode's relcache entrye doesn't have the necessary
> * information to determine whether a relation should emit data for
> * logical decoding. Force it to off if necessary.
> */
> if (!RelationIsLogicallyLogged(state->rs_old_rel))
> options |= HEAP_INSERT_NO_LOGICAL;
>
> As raw_heap_insert is used only for heap rewrites, we can simply remove
> the if condition and use the HEAP_INSERT_NO_LOGICAL flag for all TOAST
> data logged from here.
>

This fix seems fine to me.

> This does fix the issue, because we still decode the TOAST changes but
> there are no data and so
>
> if (change->data.tp.newtuple != NULL)
> {
> dlist_delete(&change->node);
> ReorderBufferToastAppendChunk(rb, txn, relation,
> change);
> }
>
> ends up not stashing the change in the hash table.

With the below change you proposed can we remove the above condition
because toast-insertion changes being processed by the reorder buffer
always have a new tuple? If a toast-insertion record doesn't have a
new tuple it has already ignored when decoding.

> It's imperfect,
> because we still decode the changes (and stash them to disk), but ISTM
> that can be fixed by tweaking DecodeInsert a bit to just ignore those
> changes entirely.
>
> Attached is a PoC patch with these two fixes.
>

I think this change is also fine.

Regards,

--
Masahiko Sawada
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Amit Khandekar 2018-11-19 09:30:44 Trigger tuple slotification (extracted from pluggable storage patch)
Previous Message Pavel Stehule 2018-11-19 08:33:58 Re: ToDo: show size of partitioned table