Re: walmethods.c/h are doing some strange things

From: Robert Haas <robertmhaas(at)gmail(dot)com>
To: Kyotaro Horiguchi <horikyota(dot)ntt(at)gmail(dot)com>
Cc: "pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: walmethods.c/h are doing some strange things
Date: 2022-09-19 18:02:27
Message-ID: CA+TgmoYZ2Epi=OYkxQ=LEV=yBvi483PW8CfGJ2m9j8qQJALc6w@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Thu, Sep 15, 2022 at 9:39 PM Kyotaro Horiguchi
<horikyota(dot)ntt(at)gmail(dot)com> wrote:
> At Fri, 2 Sep 2022 11:52:38 -0400, Robert Haas <robertmhaas(at)gmail(dot)com> wrote in
> > that type that can ever exist, and the pointer to that object is
> > stored in a global variable managed by walmethods.c. So whereas in
> > other cases we give you the object and then a way to get the
> > corresponding set of callbacks, here we only give you the callbacks,
> > and we therefore have to impose the artificial restriction that there
> > can only ever be one object.
>
> Makes sense to me.

OK, I have committed the patches. Before doing that, I fixed a couple
of bugs in the first one, and did a little bit of rebasing of the
second one.

> I agree to the view. That part seems to be a part of
> open_for_write()'s body functions. But, I'm not sure how we untangle
> them at a glance, too. In the first place, I'm not sure why we need
> to do that despite the file going to be overwritten from the
> beginning, though..

I suspect that we're going to have to untangle things bit by bit.

One place where I think we might be able to improve things is with the
handling of compression suffixes (e.g. .gz, .lz4) and temp suffixes
(e.g. .partial, .tmp). At present, responsibility for adding these
suffixes to pathnames is spread across receivelog.c and walmethods.c
in a way that, to me, looks pretty random. It's not exactly clear to
me what the best design is here right now, but I think either (A)
receivelog.c should take full responsibility for computing the exact
filename and walmethods.c should just blindly write the data into the
exact filename it's given, or else (B) receivelog.c should take no
responsibility for pathname construction and the fact that there is
pathname munging happening should be hidden inside walmethods.c. Right
now, walmethods.c is doing pathname munging, but the munging is
visible from receivelog.c, so the responsibility is spread across the
two files rather than being the sole responsibility of either.

What's also a little bit aggravating about this is that it doesn't
feel like we're accomplishing all that much code reuse here.
walmethods.c and receivelog.c are shared only between pg_basebackup
and pg_receivewal, but the tar method is used only by pg_basebackup.
The directory mode is shared, but actually the two programs need a
bunch of different things. pg_recievelog needs a bunch of logic to
deal with the possibility that it is receiving data into an existing
directory and that this directory might at any time start to be used
to feed a standby, or used for PITR. That's not an issue for
pg_basebackup: if it fails, the whole directory will be removed, so
there is no need to worry about fsyncs or padding or overwriting
existing files. On the other hand, pg_basebackup needs a bunch of
logic to create a .done file for each WAL segment which is not
required in the case of pg_receivewal. It feels like we've got as much
conditional logic as we do common logic...

--
Robert Haas
EDB: http://www.enterprisedb.com

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Tom Lane 2022-09-19 18:10:12 Re: Tree-walker callbacks vs -Wdeprecated-non-prototype
Previous Message Stephen Frost 2022-09-19 17:05:17 Re: Kerberos delegation support in libpq and postgres_fdw