Re: heap_sync seems rather oblivious to partitioned tables (wal_level=minimal)

From: David Rowley <david(dot)rowley(at)2ndquadrant(dot)com>
To: Andres Freund <andres(at)anarazel(dot)de>
Cc: Michael Paquier <michael(at)paquier(dot)xyz>, Tomas Vondra <tomas(dot)vondra(at)2ndquadrant(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: heap_sync seems rather oblivious to partitioned tables (wal_level=minimal)
Date: 2018-09-28 01:23:40
Message-ID: CAKJS1f9BbJ+FY3TbdCiap3qXHTFOiwtays9s36-oQkkM9_R5bg@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Thu, 20 Sep 2018 at 11:31, Andres Freund <andres(at)anarazel(dot)de> wrote:
>
> On 2018-09-19 12:06:47 +0900, Michael Paquier wrote:
> > On Wed, Sep 19, 2018 at 01:14:10PM +1200, David Rowley wrote:
> > > Wouldn't it be better to modify copy.c to just perform the heap_sync
> > > on just the partitions it touches?
> >
> > Yeah, my gut is telling me that this would be the best approach for now,
> > still I am not sure that this is the best move in the long term.
>
> ISTM heap_sync() would be the entirely wrong layer to handle
> partitioning. For several reasons: 1) With pluggable storage, we want to
> have multiple different table implementations, doing the syncing on the
> heap_* for partitions would thus be wrong. 2) In just about all cases we
> only want to sync a few partitions, there's not really a use-case for
> doing syncs across all partitions imo.

I started looking into this and noticed that during the check where we
set the hi_options bits to determine if we can skip WAL and FSM
checks, we only check the table that's the target of the COPY command.
That's fine for normal tables, but for partitioned tables, it's pretty
wrong. We could end up skipping FSM checks for partitions that have
concurrent DML being performed which could bloat the table. We could
also skip writing WAL for partitions that should have had WAL written.

To fix this we'd need to also know each partition that we're about to
route tuples to was also created or truncated in the same transaction,
but we can't know that just yet since we've not opened up any
partitions to check yet. I think the best back-patchable fix for this
is just to disable the optimization for partitioned tables.

I also noticed that we support COPY FREEZE for partitioned tables and
attempt to just check if the partitioned table was created or
truncated in the current transaction. This is also wrong as even if
the partitioned table had been just created, it's partitions may have
already existed and actually be visible to other transactions. It
seems best to just error out if anyone tries to do COPY FREEZE with a
partitioned table as there is no option, in this case, to build
per-partition hi_options lazily since we must error out if FREEZE
cannot be supported before the COPY starts processing tuples.

> > All the other callers of heap_sync don't care about partitioned
> > tables, so we could add an assertion on RELKIND_PARTITIONED_TABLE.
>
> Or rather, it should assert the expected relkinds?

I think an Assert() in heap_sync is a good idea. As far as I know, it
should only ever be used for normal tables or materialized views. So
perhaps best to assert the rel is one of those?

I've attached a patch to assist discussion on the problem with setting
the incorrect hi_options for partitioned tables.

Comments are welcome.

--
David Rowley http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

Attachment Content-Type Size
fix_incorrect_setting_of_hi_options_for_partitioned_tables.patch application/octet-stream 3.8 KB

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Haribabu Kommi 2018-09-28 02:15:41 Re: New function pg_stat_statements_reset_query() to reset statistics of a specific query
Previous Message Amit Langote 2018-09-28 01:21:16 Re: Postgres 11 release notes