Quick Links

Re: On-disk Tuple Size

From:	Curt Sampson <cjs(at)cynic(dot)net>
To:	Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc:	pgsql-hackers(at)postgresql(dot)org
Subject:	Re: On-disk Tuple Size
Date:	2002-04-21 19:50:55
Message-ID:	Pine.NEB.4.43.0204220432470.8450-100000@angelic.cynic.net
Views:	Whole Thread \| Raw Message \| Download mbox \| Resend email
Thread:
Lists:	pgsql-general pgsql-hackers

On Sun, 21 Apr 2002, Tom Lane wrote:

> At this point you're essentially arguing that it's faster to recompute
> the list of item sizes than it is to read it off disk. Given that the
> recomputation would require sorting the list of item locations (with
> up to a couple hundred entries --- more than that if blocksize > 8K)
> I'm not convinced of that.

No, not at all. What I'm arguing is that the I/O savings gained from
removing two bytes from the tuple overhead will more than compensate for
having to do a little bit more computation after reading the block.

How do I know? Well, I have very solid figures. I know because I pulled
them straight out of my....anyway. :-) Yeah, it's more or less instinct
that says to me that this would be a win. If others don't agree, there's
a pretty reasonable chance that I'm wrong here. But I think it might
be worthwile spending a bit of effort to see what we can do to reduce
our tuple overhead. After all, there is a good commerical DB that has
much, much lower overhead, even if it's not really comparable because it
doesn't use MVCC. The best thing really would be to see what other good
MVCC databases do. I'm going to go to the bookshop in the next few days
and try to find out what Oracle's physical layout is.

> Another difficulty is that we'd lose the ability to record item sizes
> to the exact byte. What we'd reconstruct from the item locations are
> sizes rounded up to the next MAXALIGN boundary. I am not sure that
> this is a problem, but I'm not sure it's not either.

Well, I don't see any real problem with it, but yeah, I might well be
missing something here.

> The larger BLCKSZ limit isn't nearly as desirable as it used to be,
> because of TOAST, and in fact it could be a net loser because of
> increased WAL traffic. But it'd be interesting to try it and see.

Mmmm, I hadn't thought about the WAL side of things. In an ideal world,
it wouldn't be a problem because WAL writes would be related only to
tuple size, and would have nothing to do with block size. Or so it seems
to me. But I have to go read the WAL code a bit before I care to make
any real assertions there.

cjs
--
Curt Sampson <cjs(at)cynic(dot)net> +81 90 7737 2974 http://www.netbsd.org
Don't you know, in this new Dark Age, we're all light. --XTC

In response to

Re: On-disk Tuple Size at 2002-04-21 19:10:41 from Tom Lane

Browse pgsql-general by date

	From	Date	Subject
Next Message	Mike Castle	2002-04-21 21:25:44	Re: Building perl mods pg:PG or DBD:PG on non-PostgreSQLable machines
Previous Message	Tom Lane	2002-04-21 19:10:41	Re: On-disk Tuple Size

Browse pgsql-hackers by date

	From	Date	Subject
Next Message	Thomas Lockhart	2002-04-21 20:02:13	Patches applied; initdb time!
Previous Message	"."	2002-04-21 19:34:41	Re: [INTERFACES] sqlbang