Re: logical changeset generation v6.2

From: Robert Haas <robertmhaas(at)gmail(dot)com>
To: Andres Freund <andres(at)2ndquadrant(dot)com>
Cc: "pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: logical changeset generation v6.2
Date: 2013-10-28 15:54:31
Message-ID: CA+TgmoZtu0UcygHCg=+cz9T3g4TzmmYzN1LJr3TeKWWOMXxfMQ@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Fri, Oct 25, 2013 at 7:57 AM, Andres Freund <andres(at)2ndquadrant(dot)com> wrote:
>> However, I'm leery about the idea of using a relation fork for this.
>> I'm not sure whether that's what you had it mind, but it gives me the
>> willies. First, it adds distributed overhead to the system, as
>> previously discussed; and second, I think the accounting may be kind
>> of tricky, especially in the face of multiple rewrites. I'd be more
>> inclined to find a separate place to store the mappings. Note that,
>> AFAICS, there's no real need for the mapping file to be
>> block-structured, and I believe they'll be written first (with no
>> readers) and subsequently only read (with no further writes) and
>> eventually deleted.
>
> I was thinking of storing it along other data used during logical
> decoding and let decoding's cleanup clean up that data as well. All the
> information for that should be there.

That seems OK.

> There's one snag I currently can see, namely that we actually need to
> prevent that a formerly dropped relfilenode is getting reused. Not
> entirely sure what the best way for that is.

I'm not sure in detail, but it seems to me that this all part of the
same picture. If you're tracking changed relfilenodes, you'd better
track dropped ones as well. Completely aside from this issue, what
keeps a relation from being dropped before we've decoded all of the
changes made to its data before the point at which it was dropped? (I
hope the answer isn't "nothing".)

>> One possible objection to this is that it would preclude decoding on a
>> standby, which seems like a likely enough thing to want to do. So
>> maybe it's best to WAL-log the changes to the mapping file so that the
>> standby can reconstruct it if needed.
>
> The mapping file probably can be one big wal record, so it should be
> easy enough to do.

It might be better to batch it, because if you rewrite a big relation,
and the record is really big, everyone else will be frozen out of
inserting WAL for as long as that colossal record is being written and
synced. If it's inserted in reasonably-sized chunks, the rest of the
system won't be starved as badly.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Sameer Kumar 2013-10-28 16:03:48 Re: Re: Using indexes for ORDER BY and PARTITION BY clause in windowing functions
Previous Message Matthias Schmitt 2013-10-28 15:52:13 Re: Darwin: make check fails with "child process exited with exit code 134"