Re: [PATCH 08/16] Introduce the ApplyCache module which can reassemble transactions from a stream of interspersed changes

From: Andres Freund <andres(at)2ndquadrant(dot)com>
To: Steve Singer <steve(at)ssinger(dot)info>
Cc: pgsql-hackers(at)postgresql(dot)org
Subject: Re: [PATCH 08/16] Introduce the ApplyCache module which can reassemble transactions from a stream of interspersed changes
Date: 2012-06-27 00:13:18
Message-ID: 201206270213.18367.andres@2ndquadrant.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Hi Steve,

On Tuesday, June 26, 2012 02:14:22 AM Steve Singer wrote:
> I planned to have some cutoff 'max_changes_in_memory_per_txn' value.
> > If it has
> > been reached for one transaction all existing changes are spilled to
> > disk. New changes again can be kept in memory till its reached again.
> Do you want max_changes_per_in_memory_txn or do you want to put a limit
> on the total amount of memory that the cache is able to use? How are you
> going to tell a DBA to tune max_changes_in_memory_per_txn? They know how
> much memory their system has and that they can devote to the apply cache
> versus other things, giving them guidance on how estimating how much
> open transactions they might have at a point in time and how many
> WAL change records each transaction generates seems like a step
> backwards from the progress we've been making in getting Postgresql to
> be easier to tune. The maximum number of transactions that could be
> opened at a time is governed by max_connections on the master at the
> time the WAL was generated , so I don't even see how the machine
> processing the WAL records could autotune/guess that.
It even can be significantly higher than max_connections because
subtransactions are only recognizable as part of their parent transaction
uppon commit.

I think max_changes_in_memory_per_txn will be the number of changes for now.
Making memory based accounting across multiple concurrent transactions work
efficiently and correctly isn't easy.

> > We need to support serializing the cache for crash recovery + shutdown of
> > the receiving side as well. Depending on how we do the wal decoding we
> > will need it more frequently...
> Have you described your thoughts on crash recovery on another thread?
I think I have somewhere, but given how much in flux our thoughts on decoding
are I think its not that important yet.

> I am thinking that this module would have to serialize some state
> everytime it calls cache->commit() to ensure that consumers don't get
> invoked twice on the same transaction.
In one of the other patches I implemented it by adding the (origin_id,
origin_lsn) pair to replicated commits. During recovery the startup process
sets up the shared memory status up to which point we applied.
If you then every now and then perform a 'logical checkpoint' writing down
whats the beginning lsn of the longest in-progress transaction is you can
fully recover from that point on.

> If the apply module is making changes to the same backend that the apply
> cache serializes to then both the state for the apply cache and the
> changes that committed changes/transactions make will be persisted (or
> not persisted) together. What if I am replicating from x86 to x86_64
> via a apply module that does textout conversions?
>
> x86 Proxy x86_64
> ----WAL------> apply
> cache
>
> | (proxy catalog)
>
> apply module
> textout --------------------->
> SQL statements
>
>
> How do we ensure that the commits are all visible(or not visible) on
> the catalog on the proxy instance used for decoding WAL, the destination
> database, and the state + spill files of the apply cache stay consistent
> in the event of a crash of either the proxy or the target?
> I don't think you can (unless we consider two-phase commit, and I'd
> rather we didn't). Can we come up with a way of avoiding the need for
> them to be consistent with each other?
Thats discussed in the "Catalog/Metadata consistency during changeset
extraction from wal" thread and we haven't yet determined which solution is
the best ;)

> Code Review
> =========
>
> applycache.h
> -----------------------
> +typedef struct ApplyCacheTupleBuf
> +{
> + /* position in preallocated list */
> + ilist_s_node node;
> +
> + HeapTupleData tuple;
> + HeapTupleHeaderData header;
> + char data[MaxHeapTupleSize];
> +} ApplyCacheTupleBuf;
>
> Each ApplyCacheTupleBuf will be about 8k (BLKSZ) big no matter how big
> the data in the transaction is? Wouldn't workloads with inserts of lots
> of small rows in a transaction eat up lots of memory that is allocated
> but storing nothing? The only alternative I can think of is dynamically
> allocating these and I don't know what the cost/benefit of that overhead
> will be versus spilling to disk sooner.
Dynamically allocating them totally destroys performance, I tried that. I
think at some point we should have 4 or so list of preallocated tuple bufs of
different sizes and then use the smallest possible one. But I think this
solution is ok in the very first version.

If you allocate dynamically you also get a noticeable performance drop when
you let the decoding run for a while because of fragmentation inside the
memory allocator.

> +* FIXME: better name
> + */
> +ApplyCacheChange*
> +ApplyCacheGetChange(ApplyCache*);
>
> How about:
>
> ApplyCacheReserveChangeStruct(..)
> ApplyCacheReserveChange(...)
> ApplyCacheAllocateChange(...)
>
> as ideas?
> +/*
> + * Return an unused ApplyCacheChange struct
> +*/
> +void
> +ApplyCacheReturnChange(ApplyCache*, ApplyCacheChange*);
>
> ApplyCacheReleaseChange(...) ? I keep thinking of 'Return' as us
> returning the data somewhere not the memory.
Hm. Reserve/Release doesn't sound bad. Acquire/Release is possibly even better
because reserve could be understood as a preparatory step?

> applycache.c:
> -------------------
>
> I've taken a quick look through this file and I don't see any issues
> other than the many FIXME's and other issues you've identified already,
> which I don't expect you to address in this CF.
Thanks for the review so far!

Greetings,

Andres

--
Andres Freund http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message A.M. 2012-06-27 00:37:17 Re: Posix Shared Mem patch
Previous Message Tom Lane 2012-06-26 23:30:09 Re: Posix Shared Mem patch