Re: Deleting a table file does not raise an error when the table is touched afterwards, why?

From: "David G(dot) Johnston" <david(dot)g(dot)johnston(at)gmail(dot)com>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Daniel Westermann <daniel(dot)westermann(at)dbi-services(dot)com>, "pgsql-general(at)postgresql(dot)org" <pgsql-general(at)postgresql(dot)org>
Subject: Re: Deleting a table file does not raise an error when the table is touched afterwards, why?
Date: 2016-05-30 19:32:34
Message-ID: CAKFQuwY20YCud+-m2QXEn-1uqTwuYMiW8NtpezZQPA2x-9O_rQ@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-general

On Mon, May 30, 2016 at 2:50 PM, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:

> Daniel Westermann <daniel(dot)westermann(at)dbi-services(dot)com> writes:
> > - if the above is correct why does PostgreSQL only write a partial file
> back to disk/wal? For me this still seems dangerous as potentially nobody
> will notice it
>
> In quiescent circumstances, Postgres wouldn't have written anything at
> all, and the file would have disappeared completely at server shutdown,
> and you would have gotten some sort of file-not-found error when you tried
> the "count(*)" after restarting. I hypothesize that you did an unclean
> shutdown leading to replaying some amount of WAL at restart, and that WAL
> included writing at least one block of the file (perhaps as a result of a
> hint-bit update, or some other not-user-visible maintenance operation,
> rather than anything you did explicitly). The WAL replay code will
> recreate the file if it doesn't exist on-disk --- this is important for
> robustness. Then you'd have a file that exists on-disk but is partially
> filled with empty pages, which matches the observed behavior. Depending
> on various details you haven't provided, this might be indistinguishable
> from a valid database state.
>
>
I suspect that page checksums might have detected the broken state, but if
any of the written pages were partials since the non-overwritten-zeros on
the partially written pages would have resulted in a different hash.

> - PostgreSQL assumes that someone with write access to the files knows
> what she/he is doing. ok, but still, in the real world cases like this
> happen (for whatever reason)
>
> [ shrug... ] There's also an implied contract that you don't do "rm -rf /",
> or shoot the disk drive full of holes with a .45, or various other
> unrecoverable actions. We're not really prepared to expend large amounts
> of developer effort, or large amounts of runtime overhead, to detect such
> cases. (In particular, the fact that all-zero pages are a valid state is
> unfortunate from this perspective, but it's more or less forced by
> robustness concerns associated with table-extension behavior. Most users
> would not thank us for making table extension slower in order to issue a
> more intelligible error for examples like this one.)
>

​rant​

​I have to think that we can reasonably ascribe unexpected system state to
causes other than human behavior. In both of the other examples PostgreSQL
would fail to start so I'd say we have expected behavior in the face of
those particular unexpected system states.

​IMO too much attention is being paid to the act of recreation. But even
if we presume that the only viable way to recreate this circumstance is to
do so intentionally we've documented a clever way for someone to mess with
the system in a subtle manner.

Up until Tom's last email I got very little out of the discussion. It
doesn't fill me with confidence when such an important topic is taken too
glibly. I suspect a large number of uses of PostgreSQL are in situations
where if the application works everything is assumed to be fine. People
know that random things happen to hardware and that software can have
bugs. That is what this thread describes - a potential situation that
could happen due to non-human causes that results in a somewhat silently
mis-operating system.

​There is still quite a bit of hand-waving here though - and I don't know
whether being more precise really doesn't an end-user enough good that it
would be worth writing up in the user-facing docs. Like all areas I'm sure
this is open to improvement but I'm sufficiently happy that the probability
of an event of this precision is sufficiently unlikely to thus warrant the
present behavior.​

​/rant​

David J.

In response to

Responses

Browse pgsql-general by date

  From Date Subject
Next Message Alex Ignatov 2016-05-30 20:22:26 Re: Silent data loss in its pure form
Previous Message Tom Lane 2016-05-30 18:50:08 Re: Deleting a table file does not raise an error when the table is touched afterwards, why?