From: | Robert Haas <robertmhaas(at)gmail(dot)com> |
---|---|
To: | Joachim Wieland <joe(at)mcknight(dot)de> |
Cc: | pgsql-hackers <pgsql-hackers(at)postgresql(dot)org> |
Subject: | Re: patch for parallel pg_dump |
Date: | 2012-01-27 15:58:11 |
Message-ID: | CA+TgmoY3T3apag-7Gw6i-VOpaUwOuGneVQECkwNYBumNcUGJ2A@mail.gmail.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
On Fri, Jan 27, 2012 at 10:57 AM, Robert Haas <robertmhaas(at)gmail(dot)com> wrote:
> On Sun, Jan 15, 2012 at 1:01 PM, Joachim Wieland <joe(at)mcknight(dot)de> wrote:
>> So this is the parallel pg_dump patch, generalizing the existing
>> parallel restore and allowing parallel dumps for the directory archive
>> format, the patch works on Windows and Unix.
>
> This patch introduces a large amount of notational churn, as perhaps
> well-illustrated by this hunk:
>
> static int
> ! dumpBlobs(Archive *AHX, void *arg)
> {
> + /*
> + * This is a data dumper routine, executed in a child for
> parallel backup,
> + * so it must not access the global g_conn but AH->connection instead.
> + */
> + ArchiveHandle *AH = (ArchiveHandle *) AHX;
>
> It seems pretty grotty to have a static function that gets an argument
> of the wrong type, and then just immediately turns around and casts it
> to something else. It's not exactly obvious that that's even safe,
> especially if you don't know that ArchiveHandle is a struct whose
> first element is an Archive. But even if you do know that subclassing
> is intended, that doesn't prove that the particular Archive object is
> always going to be an ArchiveHandle under the hood. If it is, why not
> just pass it as an ArchiveHandle to begin with? I think this needs to
> be revised in some way. At least in the few cases I checked, the only
> point of getting at the ArchiveHandle this way was to find
> AH->connection, which suggests to me either that AH->connection should
> be in the "public" section, inside Archive rather than ArchiveHandle,
> or else that we ought to just pass the connection object to this
> function (and all of its friends who have similar issues) as a
> separate argument. Either way, I think that would make this patch
> both cleaner and less-invasive. In fact we might want to pull out
> just that change and commit it separately to simplify review of the
> remaining work.
>
> It's not clear to me why fmtQualifiedId needs to move to dumputils.c.
> The way you have it, fmtQualifiedId() is now with fmtId(), but no
> longer with fmtCopyColumnList(), the only other similarly named
> function in that directory. That may be more logical, or it may not,
> but rearranging the code like this makes it a lot harder to review,
> and I would prefer that we make such changes as separate commits if
> we're going to do them, so that diff can do something sensible with
> the changes that are integral to the patch.
Woops, hit send a little too soon there. I'll try to make some more
substantive comments after looking at this more.
--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company
From | Date | Subject | |
---|---|---|---|
Next Message | karavelov | 2012-01-27 16:06:53 | TS: Limited cover density ranking |
Previous Message | Robert Haas | 2012-01-27 15:57:41 | Re: patch for parallel pg_dump |