Quick Links

Re: unlogged tables

From:	Robert Haas <robertmhaas(at)gmail(dot)com>
To:	Andres Freund <andres(at)anarazel(dot)de>
Cc:	pgsql-hackers(at)postgresql(dot)org, Andrew Dunstan <andrew(at)dunslane(dot)net>, Kenneth Marshall <ktm(at)rice(dot)edu>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com>, marcin mank <marcin(dot)mank(at)gmail(dot)com>, Andy Colson <andy(at)squeakycode(dot)net>
Subject:	Re: unlogged tables
Date:	2010-11-17 20:51:37
Message-ID:	AANLkTi=iRNbnJ1ZaKkN4b_KY7Oh-eq-oEaV4-hSGJ_MN@mail.gmail.com
Views:	Whole Thread \| Raw Message \| Download mbox \| Resend email
Thread:
Lists:	pgsql-hackers

On Wed, Nov 17, 2010 at 3:35 PM, Andres Freund <andres(at)anarazel(dot)de> wrote:
>> The customer is always right, but the informed customer makes better
>> decisions than the uninformed customer. This idea, as proposed, does
>> not work. If you only include dirty buffers at the final checkpoint
>> before shutting down, you have no guarantee that any buffers that you
>> either didn't write or didn't fsync previously are actually on disk.
>> Therefore, you have no guarantee that the table data is not corrupted.
>> So you really have to decide between including the unlogged-table
>> buffers in EVERY checkpoint and not ever including them at all. Which
>> one is right depends on your use case.
> How can you get a buffer which was no written out *at all*? Do you want to
> force all such pages to stay in shared_buffers? That sounds quite a bit more
> complicated than what you proposed...

Oh, you're right. We always have to write buffers before kicking them
out of shared_buffers, but if we don't fsync them we have no guarantee
they're actually on disk.

>> For example, consider the poster who said that, when this feature is
>> available, they plan to try ripping out their memcached instance and
>> replacing it with PostgreSQL running unlogged tables. Suppose this
>> poster (or someone else in a similar situation) has a 64 GB and is
>> currently running a 60 GB memcached instance on it, which is not an
>> unrealistic scenario for memcached. Suppose further that he dirties
>> 25% of that data each hour. memcached is currently doing no writes to
>> disk. When he switches to PostgreSQL and sets checkpoints_segments to
>> a gazillion and checkpoint_timeout to the maximum, he's going to start
>> writing 15 GB of data to disk every hour - data which he clearly
>> doesn't care about losing, or preserving across restarts, because he's
>> currently storing it in memcached. In fact, with memcached, he'll not
>> only lose data at shutdown - he'll lose data on a regular basis when
>> everything is running normally. We can try to convince ourselves that
>> someone in this situation will not care about needing to get 15GB of
>> disposable data per hour from memory to disk in order to have a
>> feature that he doesn't need, but I think it's going to be pretty hard
>> to make that credible.
> To really support that use case we would first need to make shared_buffers
> properly scale to 64GB - which unfortunatley, in my experience, is not yet the
> case.

Well, that's something to aspire to. :-)

> Also, see the issues in the former paragraph - I have severe doubts you can
> support such a memcached scenario by pg. Either you spill to disk if your
> buffers overflow (fine with me) or you need to throw away data memcached alike. I
> doubt there is a sensible implementation in pg for the latter.
>
> So you will have to write to disk at some point...

I agree that there are difficulties, but again, doing checkpoint I/O
for data that the user was willing to throw away is going in the wrong
direction.

>> Third use case. Someone on pgsql-general mentioned that they want to
>> write logs to PG, and can abide losing them if a crash happens, but
>> not on a clean shutdown and restart. This person clearly shuts down
>> their production database a lot more often than I do, but that is OK.
>> By explicit stipulation, they want the survive-a-clean-shutdown
>> behavior. I have no problem supporting that use case, providing they
>> are willing to take the associated performance penalty at checkpoint
>> time, which we don't know because we haven't asked, but I'm fine with
>> assuming it's useful even though I probably wouldn't use it much
>> myself.
> Maybe I am missing something - but why does this imply we have to write data
> at checkpoints?
> Just fsyncing every file belonging to an persistently-unlogged (or whatever
> sensible name anyone can come up) table is not prohibively expensive - in fact
> doing that on a local $PGDATA with approx 300GB and loads of tables doing so
> takes less than 15s on a system with hot inode/dentry cache and no dirty files.
> (just `find $PGDATA -print0|xargs -0 fsync_many_files` with fsync_many_files
> beeing a tiny c program doing posix_fadvise(POSIX_FADV_DONTNEED) on all files
> and then fsyncs every one).
> The assumption of a hot inode cache is realistic I think.

Hmm. I don't really want to try to do it in this patch because it's
complicated enough already, but if people don't mind the shutdown
sequence potentially being slowed down a bit, that might allow us to
have the best of both worlds without needing to invent multiple
durability levels. I was sort of assuming that people wouldn't want
to slow down the shutdown sequence to avoid losing data they've
already declared isn't that valuable, but evidently I underestimated
the demand for kinda-durable tables. If the overhead of doing this
isn't too severe, it might be the way to go.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

In response to

Re: unlogged tables at 2010-11-17 20:35:05 from Andres Freund

Responses

Re: unlogged tables at 2010-11-17 21:06:23 from Alvaro Herrera

Browse pgsql-hackers by date

	From	Date	Subject
Next Message	David Fetter	2010-11-17 20:58:46	Re: unlogged tables
Previous Message	Andrew Dunstan	2010-11-17 20:48:52	Re: unlogged tables