Quick Links

Re: drop/truncate table sucks for large values of shared buffers

From:	Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>
To:	Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc:	Simon Riggs <simon(at)2ndquadrant(dot)com>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: drop/truncate table sucks for large values of shared buffers
Date:	2015-06-29 01:00:19
Message-ID:	CAA4eK1JyKYq2E8L3DeRE7LVUkEu5UTMFTz-ULMuv6NZyQkV0eg@mail.gmail.com
Views:	Whole Thread \| Raw Message \| Download mbox \| Resend email
Thread:
Lists:	pgsql-hackers

On Sun, Jun 28, 2015 at 9:47 PM, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
>
> Simon Riggs <simon(at)2ndQuadrant(dot)com> writes:
> > On 27 June 2015 at 15:10, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
> >> I don't like this too much because it will fail badly if the caller
> >> is wrong about the maximum possible page number for the table, which
> >> seems not exactly far-fetched. (For instance, remember those kernel
bugs
> >> we've seen that cause lseek to lie about the EOF position?)
>
> > If that is true, then our reliance on lseek elsewhere could also cause
data
> > loss, for example by failing to scan data during a seq scan.
>
> The lseek point was a for-example, not the entire universe of possible
> problem sources for this patch. (Also, underestimating the EOF point in
> a seqscan is normally not an issue since any rows in a just-added page
> are by definition not visible to the scan's snapshot.

How do we ensure that just-added page is before or after the scan's
snapshot?
If it is before, then the above point mentioned by Simon is valid. Does
this
mean that all other usages of smgrnblocks()/mdnblocks() is safe with respect
to this issue or the consequences will not be so bad as for this usage?

> But I digress.)
>
> > The consequences of failure of lseek in this case are nowhere near as
dire,
> > since by definition the data is being destroyed by the user.
>
> I'm not sure what you consider "dire", but missing a dirty buffer
> belonging to the to-be-destroyed table would result in the system being
> permanently unable to checkpoint, because attempts to write out the buffer
> to the no-longer-extant file would fail.

So another idea here could be that if instead of failing, we just ignore the
error in case the the object (to which that page belongs) doesn't exist and
we can make Drop free by not invalidating from shared_buffers in case of
Drop/Truncate. I think this might not be sane idea as we need to have a
way to do lookup of objects from checkpoint and need to handle the case
where same Oid could be assigned to new objects (after wraparound?).

With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com

In response to

Re: drop/truncate table sucks for large values of shared buffers at 2015-06-28 16:17:18 from Tom Lane

Browse pgsql-hackers by date

	From	Date	Subject
Next Message	Robert Haas	2015-06-29 01:10:33	Re: anole: assorted stability problems
Previous Message	Jeff Janes	2015-06-29 00:39:13	Re: Refactoring pgbench.c