Quick Links

Re: Tablespaces on tertiary media

From:	Mark Morgan Lloyd <markMLl(dot)pgsql-general(at)telemetry(dot)co(dot)uk>
To:	pgsql-general(at)PostgreSQL(dot)org
Subject:	Re: Tablespaces on tertiary media
Date:	2007-09-14 13:43:25
Message-ID:	fce361$c9o$1@pye-srv-01.telemetry.co.uk
Views:	Whole Thread \| Raw Message \| Download mbox \| Resend email
Thread:
Lists:	pgsql-general

Gregory Stark wrote:

>> Thanks. If the tables were in a tablespace that was stored on something that
>> looked like a conventional filesystem would the server code be prepared to wait
>> the minutes that it took the operating system and FUSE implementation to load
>> the tables onto disc?
>
> Ah, I see what you mean now. I think you might have a problem with the planner
> opening the files to do an lseek to measure how large they are. I'm not sure
> if that gets triggered before or after constraint exclusion. That's the only
> problem I can think of.

The size could be stored in the catalogue though. However at that point I
guess that anything that was used before constraint exclusion would have to be
in the catalogue and anything after would have to initiate retrieval from
tertiary media if it's not already cached.

>> The earlier work e.g. http://www.vldb.org/conf/1996/P156.PDF apparently warned
>> the planner about long-latency devices but that's probably unnecessary if the
>> application program was aware that a table had been partitioned by age and
>> accessing old data could be slow.
>
> Well it's not like there are any alternative plans that will avoid the need to
> access the data at all. I assume the FUSE setup will always have to load the
> entire file so there's no even any difference between indexed and sequential
> access. (Unless the table is over 1G in which case you might want to avoid
> sequential scans if index scans would avoid accessing some segments.)

I'd imagine in most cases that sequential scan time would be dwarfed by
medium-load and seek time. It would be important here that the server didn't
time out assuming that it had hit a hardware problem when in actual fact the
table was still being pulled from tape.

I'd presume that when Sarawagi (who I believe is now with IBM) was doing the
work that there wasn't a straightforward way to partition tables (as is
currently described in section 5.9 of the manual) so she had to add internal
hooks. Now granted that I don't pretend to really understand how things work
(I'm a luser, not a guru) but it seems to me that it would not be difficult to
extend the tablespace definition from

CREATE TABLESPACE tablespacename LOCATION '/directory'

to something like

CREATE TABLESPACE tablespacename LOCATION '|check_loaded.pl /directory'

where the check_loaded.pl script could check that the table was cached and
return its name when available. However I guess that the script would probably
need to see the initial lseek or whatever as well... there's probably a whole
lot of non-obvious details that I've totally overlooked.

Just my 2d-worth :-)

--
Mark Morgan Lloyd
markMLl .AT. telemetry.co .DOT. uk

[Opinions above are the author's, not those of his employers or colleagues]

In response to

Re: Tablespaces on tertiary media at 2007-09-14 12:28:48 from Gregory Stark

Browse pgsql-general by date

	From	Date	Subject
Next Message	Chansup Byun	2007-09-14 13:47:13	How to recover database instance from a disaster
Previous Message	Albe Laurenz	2007-09-14 13:32:11	Re: Issue with uninstalling postgres 8.1.9