Re: CREATE DATABASE with filesystem cloning

From: Ranier Vilela <ranier(dot)vf(at)gmail(dot)com>
To: Nazir Bilal Yavuz <byavuz81(at)gmail(dot)com>
Cc: Peter Eisentraut <peter(at)eisentraut(dot)org>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>, Thomas Munro <thomas(dot)munro(at)gmail(dot)com>
Subject: Re: CREATE DATABASE with filesystem cloning
Date: 2024-05-08 13:20:54
Message-ID: CAEudQApu=jn0QOeG9YFfXPMysywVAGCjQNY1GtTabMzBd2UOGw@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Em qua., 8 de mai. de 2024 às 10:06, Nazir Bilal Yavuz <byavuz81(at)gmail(dot)com>
escreveu:

> Hi,
>
> On Wed, 8 May 2024 at 15:23, Ranier Vilela <ranier(dot)vf(at)gmail(dot)com> wrote:
> >
> > Em qua., 8 de mai. de 2024 às 08:42, Nazir Bilal Yavuz <
> byavuz81(at)gmail(dot)com> escreveu:
> >>
> >> Hi,
> >>
> >> On Wed, 8 May 2024 at 14:16, Ranier Vilela <ranier(dot)vf(at)gmail(dot)com> wrote:
> >> >
> >> >
> >> > Em qua., 8 de mai. de 2024 às 04:37, Nazir Bilal Yavuz <
> byavuz81(at)gmail(dot)com> escreveu:
> >> >>
> >> >> Hi Ranier,
> >> >>
> >> >> Thanks for looking into this!
> >> >>
> >> >> I am not sure why but your reply does not show up in the thread, so I
> >> >> copied your reply and answered it in the thread for visibility.
> >> >>
> >> >> On Tue, 7 May 2024 at 16:28, Ranier Vilela <ranier(dot)vf(at)gmail(dot)com>
> wrote:
> >> >> >
> >> >> > I know it's coming from copy-and-paste, but
> >> >> > I believe the flags could be:
> >> >> > - dstfd = OpenTransientFile(tofile, O_RDWR | O_CREAT | O_EXCL |
> PG_BINARY);
> >> >> > + dstfd = OpenTransientFile(tofile, O_CREAT | O_WRONLY | O_TRUNC |
> O_EXCL | PG_BINARY);
> >> >> >
> >> >> > The flags:
> >> >> > O_WRONLY | O_TRUNC
> >> >> >
> >> >> > Allow the OS to make some optimizations, if you deem it possible.
> >> >> >
> >> >> > The flag O_RDWR indicates that the file can be read, which is not
> true in this case.
> >> >> > The destination file will just be written.
> >> >>
> >> >> You may be right about the O_WRONLY flag but I am not sure about the
> >> >> O_TRUNC flag.
> >> >>
> >> >> O_TRUNC from the linux man page [1]: If the file already exists and
> is
> >> >> a regular file and the access mode allows writing (i.e., is O_RDWR or
> >> >> O_WRONLY) it will be truncated to length 0. If the file is a FIFO
> or
> >> >> terminal device file, the O_TRUNC flag is ignored. Otherwise, the
> >> >> effect of O_TRUNC is unspecified.
> >> >
> >> > O_TRUNC is usually used in conjunction with O_WRONLY.
> >> > See the example at:
> >> > open.html
> >> >
> >> > O_TRUNC signals the OS to forget the current contents of the file, if
> it happens to exist.
> >> > In other words, there will be no seeks, only and exclusively writes.
> >>
> >> You are right; the O_TRUNC flag truncates the file, if it happens to
> >> exist. But it should not exist in the first place because the code at
> >> [2] should check this before the OpenTransientFile() call. Even if we
> >> assume somehow the check at [2] does not work, I do not think the
> >> correct operation is to truncate the contents of the existing file.
> >
> > I don't know if there is a communication problem here.
> > But what I'm trying to suggest refers to the destination file,
> > which doesn't matter if it exists or not?
>
> I do not think there is a communication problem. Actually it matters
> because the destination file should not exist, there is a code [2]
> which already checks and confirms that it does not exist.
>
I got it.

>
> >
> > If the destination file does not exist, O_TRUNC is ignored.
> > If the destination file exists, O_TRUNC truncates the current contents
> of the file.
> > I don't see why you think it's a problem to truncate the current content
> if the destination file exists.
> > Isn't he going to be replaced anyway?
>
> 'If the destination file does not exist' means the code at [2] works
> well and we do not need the O_TRUNC flag.
>
True, the O_TRUNC is ignored in this case.

>
> 'If the destination file already exists' means the code at [2] is
> broken somehow and there is a high chance that we could truncate
> something that we do not want to. For example, there is a foo db and
> we want to create bar db, Postgres chose the foo db's location as the
> destination of the bar db (which should not happen but let's assume
> somehow checks at [2] failed), then we could wrongly truncate the foo
> db's contents.
>
Of course, truncating the wrong file would be pretty bad.

>
> Hence, if Postgres works successfully I think the O_TRUNC flag is
> unnecessary but if Postgres does not work successfully, the O_TRUNC
> flag could cause harm.
>
The disaster will happen anyway, but of course we can help in some way.
Even without truncating, the wrong file will be destroyed anyway.

> >
> > Unless we want to preserve the current content (destination file), in
> case the copy/clone fails?
>
> Like I said above, Postgres should not choose the existing file as the
> destination file.
>
> Also, we have O_CREAT | O_EXCL flags together, from the link [3] you
> shared before: If O_CREAT and O_EXCL are set, open() shall fail if the
> file exists. So, overwriting the already existing file is already
> prevented.
>
That said, I agree that not using O_TRUNC helps in some way.

best regards,
Ranier Vilela

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Peter Eisentraut 2024-05-08 13:44:12 Re: AIX support
Previous Message Nazir Bilal Yavuz 2024-05-08 13:06:45 Re: CREATE DATABASE with filesystem cloning