Re: [HACKERS] RE: [GENERAL] Transaction logging

From: Bruce Momjian <maillist(at)candle(dot)pha(dot)pa(dot)us>
To: Jan Wieck <jwieck(at)debis(dot)com>
Cc: Michael Davis <michael(dot)davis(at)prevuenet(dot)com>, pgsql-general(at)postgreSQL(dot)org, pgsql-hackers(at)postgreSQL(dot)org
Subject: Re: [HACKERS] RE: [GENERAL] Transaction logging
Date: 1999-07-07 02:17:50
Message-ID: 199907070217.WAA00377@candle.pha.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-general pgsql-hackers


Added to TODO:

* Transaction log, so re-do log can be on a separate disk by
logging SQL queries, or before/after row images

> Michael Davis wrote:
>
> >
> > What would it take to have transaction logging added to Postgres. I am a
> > c/c++ programmer and will consider contributing to the Postgres development
> > effort. I really like everything I see and read about Postgres. As a
>
> I spent some time on transaction logging since it's a feature
> I'm missing too. There are mainly two different transaction
> log mechanisms out.
>
> 1. Log queries sent to the backend.
>
> 2. Log images of inserted/updated rows and row ID's of
> deleted ones.
>
> The query level logging will write less information if
> queries usually affect a large number of rows. Unfortunately
> the extensibility of Postgres work's against this approach.
> There could be any number of user written functions who's
> results aren't reproduceable during recovery. And core
> features of Postgres itself would introduce the same problem.
> Have a sequence which is used to create default values for
> multiple tables, so that one ID is unique across them. Now
> two backends insert (with INSERT ... SELECT) concurrently
> into different tables using the same sequence. It's a
> classic race condition and it depends on context switching
> and page faults which backend will get which sequence
> numbers. You cannot foresee and you cannot reproduce, except
> you hook into the sequence generator and log this too. Later
> when recovering, another hook into the sequence generator
> must reproduce the logged results on the per
> backend/transaction/command base, and the same must be done
> for each function that usually returns unreproduceable
> results (anything dealing with time, pid's, etc.).
>
> As said, this must also cover user functions. So at least
> there must be a general log API that provides such a
> functionality for user written functions.
>
> The image logging approach also has problems. First, the only
> thing given to the heap access methods to outdate a tuple on
> update/delete is the current tuple ID (information that tells
> which tuple in which block is meant). So you need to save
> the database files in binary format, because during the
> actually existing dump/restore this could change and the
> logged CTID's would hit the wrong tuples.
>
> Second, you must remember in the log which transaction ID
> these informations came from and later if the transaction
> committed or not, so the recovery can set this commit/abort
> information in pg_log too. pg_log is a shared system file and
> the transaction ID's are unique only for one server. Using
> this information for online replication of a single database
> to another Postgres installation will not work.
>
> Third, there are still some shared system catalogs across all
> databases (pg_database, pg_group, pg_log!!!, pg_shadow and
> pg_variable). Due to that it would be impossible (or at least
> very, very tricky) to restore/recover (maybe point in time)
> one single database. If you destroy one database and restore
> it from the binary backup, these shared catalogs cannot be
> restored too, so they're out of sync with the backup time.
> How should the recovery now hit the right things (which
> probably must not be there at all)?.
>
> All this is really a mess. I think the architecture of
> Postgres will only allow something on query level with some
> general API for things that must reproduce the same result
> during recovery. For example time(). Inside the backend,
> time() should never be called directly. Instead another
> function is to be called that log's during normal operation
> which time get's returned by this particular function call
> and if the backend is in recovery mode, returns the value
> from the log.
>
> And again, this all means trouble. Usually, most queries sent
> to the database don't change any data because they are
> SELECT's. It would dramatically blow up the log amount if you
> log ALL queries instead of only those that modify things. But
> when the query begins, you don't know this, because a SELECT
> might call a function that uses SPI to UPDATE something else.
> So the decision if the query must be logged or not can only
> be made when the query is done (by having some global
> variable where the heap access methods set a flag that
> something got written). Now you have to log function call's
> like time() even if the query will not modify any single row
> in the database because the query is a
>
> SELECT 'now'::datetime - updtime FROM ...
>
> Doing this on a table with thousands of rows will definitely
> waste much logging space and slowdown the whole thing by
> unnecessary logging.
>
> Maybe it's a compromise if at each query start the actual
> time and other such information is remembered by the backend,
> all time() calls return this remembered value instead of the
> real one (wouldn't be bad anyway IMHO), and this information
> is logged only if the query is to be logged.
>
> Finally I think I must have missed some more problems, but
> aren't these enough already to frustrate you :-?
>
>
> Jan
>
> --
>
> #======================================================================#
> # It's easier to get forgiveness for being wrong than for being right. #
> # Let's break this rule - forgive me. #
> #======================================== jwieck(at)debis(dot)com (Jan Wieck) #
>
>
>
>

--
Bruce Momjian | http://www.op.net/~candle
maillist(at)candle(dot)pha(dot)pa(dot)us | (610) 853-3000
+ If your life is a hard drive, | 830 Blythe Avenue
+ Christ can be your backup. | Drexel Hill, Pennsylvania 19026

In response to

Browse pgsql-general by date

  From Date Subject
Next Message Martin Weinberg 1999-07-07 02:48:20 Problems with inequalities on numeric fields in 6.5
Previous Message Bruce Momjian 1999-07-07 02:06:27 Re: [GENERAL] inet and cidr type problems

Browse pgsql-hackers by date

  From Date Subject
Next Message Bruce Momjian 1999-07-07 02:19:34 Re: [HACKERS] Niladic functions
Previous Message Bruce Momjian 1999-07-07 02:06:27 Re: [GENERAL] inet and cidr type problems