Re: thinko in basic_archive.c

From: Kyotaro Horiguchi <horikyota(dot)ntt(at)gmail(dot)com>
To: bharath(dot)rupireddyforpostgres(at)gmail(dot)com
Cc: robertmhaas(at)gmail(dot)com, nathandbossart(at)gmail(dot)com, pgsql-hackers(at)postgresql(dot)org
Subject: Re: thinko in basic_archive.c
Date: 2022-10-21 05:13:46
Message-ID: 20221021.141346.1491472262189507263.horikyota.ntt@gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Hi.

Anyway, on second thought, lager picture than just adding the
post-process-end callback would out of the scope of this patch. So I
write some comments on the patch first, then discussion the rest.

Thu, 20 Oct 2022 13:29:12 +0530, Bharath Rupireddy <bharath(dot)rupireddyforpostgres(at)gmail(dot)com> wrote in
> > No. I didn't mean that, If server stops after a successfull
> > durable_rename but before the next call to
> > basic_archive_file_internal, that call back makes false comlaint since
> > that temprary file is actually gone.
>
> Right. Fixed it.

Thanks, but we don't need to wipe out the all bytes. Just putting \0
at the beginning of the buffer is sufficient. And the Memset() at the
beginning of basic_archive_file_internal is not needed since that
static variables are initially initialized to zeros.

This is not necessarily needed, but it might be better we empty
tempfilepath after unlinking the file.

> > +static char tempfilepath[MAXPGPATH + 256];
> >
> > MAXPGPATH is the maximum length of a file name that PG assumes to be
> > able to handle. Thus extending that length seems wrong.
>
> I think it was to accommodate the temp file name - "archtemp", file,
> MyProcPid, epoch, but I agree that it can just be MAXPGPATH. However,
> most of the places the core defines the path name to be MAXPGPATH +
> some bytes.

Mmm. I found that basic_archive already does the same thing. So lets
follow that in this patch.

+ expectation that a value will soon be provided. Care must be taken when
+ multiple servers are archiving to the same
+ <varname>basic_archive.archive_library</varname> directory as they all
+ might try to archive the same WAL file.

I don't understand what kind of care should be taken by reading this..

Anyway the PID + millisecond-resolution timestamps work in the almost
all cases, but it's not perfect. So.. I don't come up with what to
think about this..

Traditionally we told people that "archiving should not overwrite a
file unconditionally. Generally it is safe only when the contents are
identical then should be errored-out otherwise.".. Ah this is.

https://www.postgresql.org/docs/devel/continuous-archiving.html

> Archive commands and libraries should generally be designed to
> refuse to overwrite any pre-existing archive file. This is an
> important safety feature to preserve the integrity of your archive
> in case of administrator error (such as sending the output of two
> different servers to the same archive directory). It is advisable to
> test your proposed archive library to ensure that it does not
> overwrite an existing file.
...
> file again after restarting (provided archiving is still
> enabled). When an archive command or library encounters a
> pre-existing file, it should return a zero status or true,
> respectively, if the WAL file has identical contents to the
> pre-existing archive and the pre-existing archive is fully persisted
> to storage. If a pre-existing file contains different contents than
> the WAL file being archived, the archive command or library must
> return a nonzero status or false, respectively.

On the other hand, basic_archive seems to overwrite existing files
unconditionally. I think this is not great, in that we offer a tool
betrays to our own wrtten suggestion...

The following is out-of-the-scope discussions.

==================
At Thu, 20 Oct 2022 13:29:12 +0530, Bharath Rupireddy <bharath(dot)rupireddyforpostgres(at)gmail(dot)com> wrote in
> On Thu, Oct 20, 2022 at 6:57 AM Kyotaro Horiguchi
> <horikyota(dot)ntt(at)gmail(dot)com> wrote: > > > The archive module must be
> responsible for cleaning up the temp file > > that it creates. One
> way to do it is in the archive module's shutdown > > callback, this
> covers most of the cases, but not all. > > True. But I agree to
> Robert that such temporary files should be > cleanup-able without
> needing temporarily-valid knowledge (current file > name, in this
> case). A common strategy for this is to name those files > by names
> that can be identifed as garbage.
>
> I'm not sure how we can distinguish temp files as garbage based on
> name. As Robert pointed out upthread, using system identifier may not
> help as the standbys share the same system identifier and it's
> possible that they might archive to the same directory. Is there any
> other way?

Correct naming scheme would lead to resolution.

> > But since power cut is a typical crash source, we need to identify
> > ruined temporary files and the current naming convention is incomplete
> > in this regard.
>
> Please note that basic_archive module creates one temp file at a time
> to make file copying/moving atomic and it can keep track of the temp
> file name and delete it using shutdown callback which helps in most of
> the scenarios. As said upthread, repeated crashes while basic_archive
> module is atomically copying files around is a big problem in itself
> and basic_archive module need not worry about it much.

I'm not sure. It's a "basic_archiver", but an "example_archiver". I
read the name as "it is no highly configuratable but practically
usable". In this criteria, clean up feature is not too much.

> > there. However, I think we can identify truly rotten temp files by
> > inserting host name or cluster name (means cluster_name in
> > postgresql.conf) even in that case. This premise that DBA names every
>
> Well, these ideas are great! However, we can leave defining such
> strategies to archive module implementors. IMO, the basich_archive
> module ought to be as simple and elegant as possible yet showing up
> the usability of archive modules feature.

I don't understand why garbage-cleanup is not elegant.

On the other hand, if we pursue minimalism about this tool, we don't
need the timestamp part since this tool cannot write two or more files
simultaneously by the same process. (I don't mean we shoud remove that
part.)

> > Of course, it premised that a cluster uses the same name for a
> > segment. If we insert cluseter_name into the temprary name, a starting
> > cluster can indentify garbage files to clean up. For example if we
> > name them as follows.
> >
> > ARCHTEMP_cluster1_pid_time_<lsn>
> >
> > A starting cluster can clean up all files starts with
> > "archtemp_cluster1_*". (We need to choose the delimiter carefully,
> > though..)
>
> Postgres cleaning up basic_archive modules temp files at the start up
> isn't a great idea IMO. Because these files are not related to server
> functionality in any way unlike temp files removed in
> RemovePgTempFiles(). IMO, we ought to keep the basic_archive module
> simple.

I think "init" feature is mandatory. But it would be another project.

> > > Please see the attached v2 patch.
> >
> > +static char tempfilepath[MAXPGPATH + 256];
> >
> > MAXPGPATH is the maximum length of a file name that PG assumes to be
> > able to handle. Thus extending that length seems wrong.
>
> I think it was to accommodate the temp file name - "archtemp", file,
> MyProcPid, epoch, but I agree that it can just be MAXPGPATH. However,
> most of the places the core defines the path name to be MAXPGPATH +
> some bytes.

Oooh. I don't say "most" but some instances are found. (Almost all
usage of MAXPGPATH are not accompanied by additional length). But it
would be another issue.

Anyway I don't think even if the use of over-sized path buffers is
widely spread in our tree, it cannot be a reason for this new code
need to do the same thing. If we follow that direction, the following
code in basic_archive should have MAXPGPATH*2 wide, since it stores a
"<directory>/<filename>" construct. (That usage is found in, e.g.,
dbsize.c, which of course I think stupid..)

regards.

--
Kyotaro Horiguchi
NTT Open Source Software Center

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Kyotaro Horiguchi 2022-10-21 05:25:57 Re: thinko in basic_archive.c
Previous Message Justin Pryzby 2022-10-21 03:40:40 Re: Cygwin cleanup