RE: [HACKERS] mdnblocks is an amazing time sink in huge relations

From: "Hiroshi Inoue" <Inoue(at)tpf(dot)co(dot)jp>
To: "Tom Lane" <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: <pgsql-hackers(at)postgreSQL(dot)org>
Subject: RE: [HACKERS] mdnblocks is an amazing time sink in huge relations
Date: 1999-10-18 05:40:47
Message-ID: 000c01bf192b$5437e2a0$2801007e@cadzone.tpf.co.jp
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

> "Hiroshi Inoue" <Inoue(at)tpf(dot)co(dot)jp> writes:
> > I have been suspicious about current implementation of md.c.
> > It relies so much on information about existent phisical files.
>
> Yes, but on the other hand we rely completely on those same physical
> files to hold our data ;-). I don't see anything fundamentally
> wrong with using the existence and size of a data file as useful
> information. It's not a substitute for a lock, of course, and there
> may be places where we need cross-backend interlocks that we haven't
> got now.
>

We have to lseek() each time to know the number of blocks of a table
file. Isn't it a overhead ?

> > How do you think about the following ?
> >
> > 2. If a backend was killed or crashed in the middle of execution of
> > mdunlink()/mdtruncate(),half of segments wouldn't be unlink/
> > truncated.
>
> That's bothered me too. A possible answer would be to do the unlinking
> back-to-front (zap the last file first); that'd require a few more lines
> of code in md.c, but a crash midway through would then leave a legal
> file configuration that another backend could still do something with.

Oops,it's more serious than I have thought.
mdunlink() may only truncates a table file by a crash while unlinking
back-to-front.
A crash while unlinking front-to-back may leave unlinked segments
and they would suddenly appear as segments of the recreated table.
Seems there's no easy fix.

> > 3. In cygwin port,mdunlink()/mdtruncate() may leave segments of 0
> > length.
>
> I don't understand what causes this. Can you explain?
>

You call FileUnlink() after FileTrucnate() to unlink in md.c. If
FileUnlink()
fails there remains segments of 0 length. But it seems not critical in
this issue.

> > 4. We couldn't mdcreate() existent files and coudn't mdopen()/md
> > unlink() non-existent files. So there are some cases that we
> > could neither CREATE TABLE nor DROP TABLE.
>
> True, but I think this is probably the best thing for safety's sake.
> It seems to me there is too much risk of losing or overwriting valid
> data if md.c bulls ahead when it finds an unexpected file configuration.
> I'd rather rely on manual cleanup if things have gotten that seriously
> out of whack... (but that's just my opinion, perhaps I'm in the
> minority?)
>

There is another risk.
We may remove other table files manually by mistake.
And if I were a newcomer,I would not consider PostgreSQL as
a real DBMS(Fortunately I have never seen the reference to this).

However,I don't object to you because I also have the same anxiety
and could provide no easy solution,

Probably it would require a lot of work to fix correctly.
Postponing real unlink/truncating until commit and creating table
files which correspond to their oids ..... etc ...
It's same as "DROP TABLE inside transations" requires.

Hmm,is it worth the work ?

Regards.

Hiroshi Inoue
Inoue(at)tpf(dot)co(dot)jp

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Tatsuo Ishii 1999-10-18 06:08:59 Re: [HACKERS] sort on huge table
Previous Message Tom Lane 1999-10-18 05:10:56 Re: [HACKERS] don't know whether nodes of type 719 are equal