From: | "Hiroshi Inoue" <Inoue(at)tpf(dot)co(dot)jp> |
---|---|
To: | "Tom Lane" <tgl(at)sss(dot)pgh(dot)pa(dot)us> |
Cc: | <pgsql-hackers(at)postgreSQL(dot)org> |
Subject: | RE: [HACKERS] mdnblocks is an amazing time sink in huge relations |
Date: | 1999-10-18 05:40:47 |
Message-ID: | 000c01bf192b$5437e2a0$2801007e@cadzone.tpf.co.jp |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
> "Hiroshi Inoue" <Inoue(at)tpf(dot)co(dot)jp> writes:
> > I have been suspicious about current implementation of md.c.
> > It relies so much on information about existent phisical files.
>
> Yes, but on the other hand we rely completely on those same physical
> files to hold our data ;-). I don't see anything fundamentally
> wrong with using the existence and size of a data file as useful
> information. It's not a substitute for a lock, of course, and there
> may be places where we need cross-backend interlocks that we haven't
> got now.
>
We have to lseek() each time to know the number of blocks of a table
file. Isn't it a overhead ?
> > How do you think about the following ?
> >
> > 2. If a backend was killed or crashed in the middle of execution of
> > mdunlink()/mdtruncate(),half of segments wouldn't be unlink/
> > truncated.
>
> That's bothered me too. A possible answer would be to do the unlinking
> back-to-front (zap the last file first); that'd require a few more lines
> of code in md.c, but a crash midway through would then leave a legal
> file configuration that another backend could still do something with.
Oops,it's more serious than I have thought.
mdunlink() may only truncates a table file by a crash while unlinking
back-to-front.
A crash while unlinking front-to-back may leave unlinked segments
and they would suddenly appear as segments of the recreated table.
Seems there's no easy fix.
> > 3. In cygwin port,mdunlink()/mdtruncate() may leave segments of 0
> > length.
>
> I don't understand what causes this. Can you explain?
>
You call FileUnlink() after FileTrucnate() to unlink in md.c. If
FileUnlink()
fails there remains segments of 0 length. But it seems not critical in
this issue.
> > 4. We couldn't mdcreate() existent files and coudn't mdopen()/md
> > unlink() non-existent files. So there are some cases that we
> > could neither CREATE TABLE nor DROP TABLE.
>
> True, but I think this is probably the best thing for safety's sake.
> It seems to me there is too much risk of losing or overwriting valid
> data if md.c bulls ahead when it finds an unexpected file configuration.
> I'd rather rely on manual cleanup if things have gotten that seriously
> out of whack... (but that's just my opinion, perhaps I'm in the
> minority?)
>
There is another risk.
We may remove other table files manually by mistake.
And if I were a newcomer,I would not consider PostgreSQL as
a real DBMS(Fortunately I have never seen the reference to this).
However,I don't object to you because I also have the same anxiety
and could provide no easy solution,
Probably it would require a lot of work to fix correctly.
Postponing real unlink/truncating until commit and creating table
files which correspond to their oids ..... etc ...
It's same as "DROP TABLE inside transations" requires.
Hmm,is it worth the work ?
Regards.
Hiroshi Inoue
Inoue(at)tpf(dot)co(dot)jp
From | Date | Subject | |
---|---|---|---|
Next Message | Tatsuo Ishii | 1999-10-18 06:08:59 | Re: [HACKERS] sort on huge table |
Previous Message | Tom Lane | 1999-10-18 05:10:56 | Re: [HACKERS] don't know whether nodes of type 719 are equal |