Re: out-of-line (TOAST) storage ineffective when loading from dump?

From: Richard Huxton <dev(at)archonet(dot)com>
To: Markus Bertheau <mbertheau(dot)pg(at)googlemail(dot)com>
Cc: pgsql-general(at)postgresql(dot)org
Subject: Re: out-of-line (TOAST) storage ineffective when loading from dump?
Date: 2008-02-19 10:59:22
Message-ID: 47BAB68A.1060709@archonet.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-general

Markus Bertheau wrote:
> 2008/2/19, Richard Huxton <dev(at)archonet(dot)com>:
> I'm loading a table with some short attributes and a large toastable attribute.
> That means that for every main table heap page several toast table heap pages
> are written. This happens through the buffer cache and the background writer,
> so maybe the pages aren't written in the order in which they were created in
> the buffer cache, but if they are, they end up on disk (assuming that the file
> system is not fragmented) roughly like that:
>
> main table heap page 1
> toast table heap page 1
> toast table heap page .
> toast table heap page n
> main table heap page 2
> toast table heap page n+1
> toast table heap page .
> toast table heap page 2n

Well, that's assuming:
1. You're not re-using space from previously deleted/updated rows.
2. You've not got a RAID array striping writes over multiple disks
3. The underlying filesystem + buffering isn't doing anything too clever.

> Significantly later a sequential scan of the table has to be made, the
> toastable attribute is not needed for the operation. The caches are cold or
> otherwise occupied.

OK

> If the granularity of caches that are nearer to the disk in
> the cache hierarchy than the PG buffer cache is higher than the PG page size
> (i.e. a cache unit is bigger than the PG page size), then every read of a main
> table heap page will inescapably read some toast table heap pages into the
> cache (whichever cache that may be).

I think disk blocks on your more common file-systems are 4KB by default
(ext2/3 and ntfs for example). I'm not aware of any default disk-block
sizes more than the 8KB page-size of PG. Of course, the OS may read
ahead if it sees you scanning, but it will do that on a file basis.

> If all the main table heap pages were
> laid out adjecently on disk, they could be read faster and caches be polluted
> less.

True enough. The key word there though is "if" - it means that PG is
trying to out-think the filesystem, OS and hardware.

It should be easy enough to test on a particular system though.
1. pg_restore a sample table with TOASTed data.
2. pg_restore the same data but no TOASTed data.
3. cluster the table with TOASTed data (which should force a rewrite of
the whole table but not its TOASTed data)

If the timing of various selects differ hugely then there's something
worth investigating.

--
Richard Huxton
Archonet Ltd

In response to

Responses

Browse pgsql-general by date

  From Date Subject
Next Message Magnus Hagander 2008-02-19 11:49:15 Re: Analogue to SQL Server UniqueIdentifier?
Previous Message Martijn van Oosterhout 2008-02-19 10:47:44 Re: out-of-line (TOAST) storage ineffective when loading from dump?