From: | KaiGai Kohei <kaigai(at)ak(dot)jp(dot)nec(dot)com> |
---|---|
To: | Robert Haas <robertmhaas(at)gmail(dot)com> |
Cc: | Joshua Tolley <eggyknap(at)gmail(dot)com>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Bernd Helmle <mailings(at)oopsware(dot)de>, pgsql-hackers(at)postgresql(dot)org |
Subject: | Re: [PATCH] [v8.5] Security checks on largeobjects |
Date: | 2009-07-17 01:36:11 |
Message-ID: | 4A5FD58B.5070201@ak.jp.nec.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
I summarized the design proposal and issues currently we have.
I would like to see any comments corresponding to the proposition.
Especially, selection of the snapshot is a headach issue for me.
----------------
This project tries to solve two items listed at:
http://wiki.postgresql.org/wiki/Todo#Binary_Data
* Add security checks for large objects
* Allow read/write into TOAST values like large objects
= Introduction =
We need to associate a metadata for a certain largeobject to
implement security checks for largeobjects. However, the data
structure of largeobjects are not suitable to manage its
metadata (such as owner identifier, database acls ...) on
a certain largeobject, because a largeobject is stored as
separated page frames in the pg_largeobject system catalog.
Thus, we need to revise the data structure to manage a certain
largeobject.
An interesting fact is a similarity of data structure between
TOAST table and pg_lageobject.
A TOAST relation is declared as follows:
pg_toast_%u (
chunk_id oid,
chunk_seq int4,
chunk_data bytea,
unique(chunk_id, chunk_seq)
)
Definition of the pg_largeobject is as follows:
pg_largeobject(
loid oid,
pageno int4,
data bytea,
unique(loid, pageno)
)
They have an identical data structure, so it is quite natural
to utilize TOAST mechanism to store pagef rames of largeobject.
= Design =
In my plan, role of the pg_largeobject will be changed to
manage metadata of largeobjects, and it will be redefined
as follows:
CATALOG(pg_largeobject,2613)
{
Oid loowner; /* OID of the owner */
Oid lonsp; /* OID of the namespace */
aclitem loacl[1]; /* ACL of the largeobject */
Blob lodata; /* Contents of the largeobject */
} FormData_pg_largeobejct;
For access controls purpose, its ownership, namespace and ACLs
are necessary. In addition, the Blob is a new type which support
to read/write a its partial TOAST value.
The current lo_xxx() interfaces will perform as a wrapper function
to access a certain pg_largeobject.lodata identified by a largeobject
handler. The loread(), lowrite() or similar interfaces will support
partial accesses on the Blob type. It enables user defined relation
to contain large data using TOAST mechanism, with reasonable resource
comsumption. (Note that TOAST replaces whole of the chunks with same
identifier, even if it changes just a single byte.)
= Interfaces =
== New type ==
We need a new variable length type that has the following feature,
to allow users partial accesses.
* It always use external TOAST table, independent from its size.
If toasted data is stored as inline, we cannot update it independent
from the main table.
It does not prevent partial read, but meaningless because inlined
data is enough small.
* It always store the data without any compression.
We cannot easily compute required data offset on the compressed
data. All the toasted data need to be uncompressed, for both of
reader and writer access.
== lo_xxx() interfaces ==
A new version of loread() and lowrite() are necessary to access
a part of toasted data within user defined tables. It can be defined
as follows:
loread(Blob data, int32 offset, int32 length)
lowrite(Blob data, int32 offset, Bytea data)
== GRANT/REVOKE ==
When we access traditional largeobjects, reader permission (SELECT)
or writer permission (UPDATE) should be checked on accesses.
The GRANT/REVOKE statements are enhanced as follows:
GRANT SELECT ON LARGE OBJECT 1234 TO kaigai;
It allows "kaigai" to read the largeobject: 1234.
= Issues =
== New pg_type.typstorage ==
The variable length data is always necessary to be stored in external
storage and uncompressed. The existing typstorage does not satisfies
the requirement, so we need to add a new pg_type.typstorage strategy.
The new typstorage strategy forces:
- It always stores the given varlena data on external toast relation.
- It always stores the given varlena data without any compression.
It will give us performance loss, so existing Text or Bytea will be
more suitable to store variable length data being not very large.
== Snapshot ==
The largeobject interface uses SnapshotNow for writable accesses, and
GetActiveSnapshot() for read-only accesses, but toast_fetch_datum()
uses SnapshotToast to scan the toast relation.
It seems to me SnapshotToast depends on an assumption that tuples
within TOAST relation does not have any multiple versions.
When we update a toast value, TOAST mechanism inserts whole of
variable length datum with a new chunk_id, and older chunks are
removed at toast_delete_datum().
The TOAST pointer is updated to the new chunk_id, and its visibility
is under MVCC controls.
The source code comments at HeapTupleSatisfiesToast() says as follows:
/*
* HeapTupleSatisfiesToast
* True iff heap tuple is valid as a TOAST row.
*
* This is a simplified version that only checks for VACUUM moving conditions.
* It's appropriate for TOAST usage because TOAST really doesn't want to do
* its own time qual checks; if you can see the main table row that contains
* a TOAST reference, you should be able to see the TOASTed value. However,
* vacuuming a TOAST table is independent of the main table, and in case such
* a vacuum fails partway through, we'd better do this much checking.
*
* Among other things, this means you can't do UPDATEs of rows in a TOAST
* table.
*/
If largeobjects-like interface is available to update a part of TOAST
values, we cannot keep the assumption.
At the beginning, I have a plan to apply the result od GetActiveSnapshot()
to fetch toasted value. Is there any matter which can be expected?
--
OSS Platform Development Division, NEC
KaiGai Kohei <kaigai(at)ak(dot)jp(dot)nec(dot)com>
From | Date | Subject | |
---|---|---|---|
Next Message | Robert Haas | 2009-07-17 01:38:48 | Re: [PATCH] SE-PgSQL/tiny rev.2193 |
Previous Message | Greg Stark | 2009-07-17 01:02:06 | Re: join removal |