Quick Links

Re: [SQL] (Ab)Using schemas and inheritance

From:	Alban Hertroys <alban(at)magproductions(dot)nl>
To:	Jorge Godoy <jgodoy(at)gmail(dot)com>
Cc:	"Jim C(dot) Nasby" <jnasby(at)pervasive(dot)com>, pgsql-general(at)postgresql(dot)org
Subject:	Re: [SQL] (Ab)Using schemas and inheritance
Date:	2006-05-24 16:06:43
Message-ID:	44748493.5090107@magproductions.nl
Views:	Whole Thread \| Raw Message \| Download mbox \| Resend email
Thread:
Lists:	pgsql-general pgsql-sql

Jorge Godoy wrote:
> Em Quarta 24 Maio 2006 11:48, Alban Hertroys escreveu:
>
>>Jorge Godoy wrote:
>>
>>>Some things are really important here:
>>>
>>> - performance for operations on an individual company --- it has to be
>>> as fast as possible because this might be used by my client's clients.
>>
>>I suppose inherited tables would be faster. Both at read and write
>>operations.
>>
>>However, I think that partitioning/clustering the table over an index on
>>company_id would help a lot. Also, considering you need to keep a lot of
>>old data around, you may be able to split your data in archived and
>>recent data, keeping your indices small (partial indices maybe).
>
>
> But this could be easily done with two commands (besides creating indices
> again):
>
> ALTER TABLE x RENAME TO x_year;
> CREATE TABLE x () INHERITS FROM base.x;
>
> No need to change the rest...

CREATE TABLE sample ();
ALTER TABLE sample RENAME TO sample_2004;
CREATE TABLE sample () INHERITS FROM sample_2004;
ALTER TABLE sample RENAME TO sample_2005;
CREATE TABLE sample () INHERITS FROM sample_2005;

You really mean that yearly doubling inheritance? I've been told
(admittedly by people I don't give much credit) that inheriting tables
works by joining the underlying tables, which could seriously affect
performance after a few years of doing this.

Did you try this? It seems to be important.

Considering this case with views, you could:

CREATE INDEX sample_2004_idx ON sample (object_id) WHERE date BETWEEN
'2004-01-01'::date AND '2004-12-31'::date;

CREATE INDEX sample_2005_idx ON sample (object_id) WHERE date BETWEEN
'2005-01-01'::date AND '2005-12-31'::date;

etc.

You could extend that to also take company_id into account and put the
indices on the base tables instead. I figure that takes less maintenance.

Mind you, to make use of those indices, you need date somewhere in your
where-clauses.

In your inheritence case, if you have tables covering this year only
(that of course inherit from the complete data set somehow), you could
use the ONLY statement in your FROM clauses. You wouldn't need to bother
with date.

>>> - safety: if it becomes available externally, then one client could
>>>never see other client's data. This is a main concern. If used only
>>>internally it is desirable that only authorized employees view each
>>>client since all of them doesn't need to access all clients even if
>>>working directly with accounting.
>>
>>Safety would be about equal to your inheritence solution. You can GRANT
>>rights on the views, that can be different from the access rights on the
>>base tables. The views must evidently have access to your base tables.
>
> Of course. This way, to keep the view working, I'd have to use SELECT INTO
> when archiving data, right? IIRC, views would still refer to the old table
> if I rename them (this is a problem on my solution as well if I can't use a
> common view and make it respect the search_path, as I described on the other
> message).

By archiving you mean moving yearly data to disk via a temporary table
or some such? In that case, yes, probably.

For yearly archiving, dumping one of those inherited tables (containing
only data for a specific year, after all) could work. Never tried that
before.

>>> - easy to code on application side: other systems will be plugged to
>>>this database. The more we can do to avoid other people mistakes, the
>>>better.
>>
>>Views might be a bit easier there, as you can add derived data to your
>>results (like first name, infix and surname concatenated, dates in a
>>format specific for a companys' locale, etc.).
>
> I'll have the full data on each schema, so this is also possible.

Okay, no client data processing required, then. Nice.

>>> - easy to maintain database: if it is too painful, hacks will come and
>>>this is not a good plan before starting the project...
>>
>>This was my main reason to suggest views. For select you could probably
>>get away with SELECT * FROM table WHERE company_id = x; (Though SELECT *
>>is being frownde upon).
>
> Instead of doing SELECT * FROM schema.table; (no filter here)...

Indeed.

Hmm... thinking about this, I realize that in your case selecting all
data in a year could be faster. The planner would certainly choose a
sequential scan plan, while with all data in a single table (with views
over them) may trigger an index scan, which would probably be slower.

Depends on whether you're likely to do that, of course.

>>The update/insert/delete parts are also all rather similar. You may be
>>able to generate REPLACE VIEW statements if the base table layout
>>changes. You'd probably need to DROP and re-CREATE them, but being views
>>your data is not at risc. Nothing physically changes in your tables.
>>Hmm... Almost forgot about the RULEs that make them updatable... Those'd
>>need updating too, of course...
>
> Is it at risk with inherited tables?

Well, the data is inside those tables, where it's not with views. If you
accidentaly DROP a column in a table, the data is gone. With a view you
just don't see it anymore, but it's still there.

>>With inherited tables you would need a way to dump and restore their
>>contents, or do a whole lot of ALTER TABLE statements. You'd be
>>physically altering the tables that contain your data; there's more than
>>zero risc involved. You can't rollback DDL statements...
>
> Why? If I add a column to the base table then this new column appears on all
> inherited tables. And as I can't rollback DDL statements, I won't be risking
> breaking things when I change the view ;-)

Oh right, I forgot about that. But what happens if you want to change a
columns data type for example? I'd add a new column with the new type,
update it with the data in the column I want to change, drop the
original column and rename it. What happens if you do that when tables
inherit the one you're working on?

Also note that foreign keys to an inherited table don't work properly.
You get foreign key violations, because the constraint is looking at the
wrong table, IIRC.

>>OTOH, with inherited tables you can have company-tables that differ from
>>the base tables for a specific company; with views that'd be... difficult.
>
> I'd have to have an exclusive view and use auxiliar tables if I wanted that
> with views... For now, from what we've discussed with the client, this won't
> be the case. But it is something that we should think about for future
> expansions... If there aren't too many exceptions, then we can deal with

Sure, they always say that ;)

> that somewhat cleanly on both sides with the approach of the auxiliar table
> and exclusive view, but if there are too many exceptions adapting the table
> might be better (both are "hard" if there are too many exceptions...). I
> haven't gotten into this planning level yet.

You may be able to achieve some customization by inheriting from the
companies inherited tables again, adding a few columns. It's still a
pain, of course, as the client code will need to be customized too to
take advantage of the different table definitions.

>>>Am I right or completely wrong? :-)
>>
>>Have you considered downtime if a schema change is necessary? What would
>>cause the most; tables or views?
>
> What do you mean by a schema change? If we go this route, then this will have
> to be a core part of the database and application design. Changes here will
> be critical, so there shouldn't be changes or they should be done
> incrementally. Bigger changes will require a maintenance routine that will
> probably take the system offline for some time (from minutes to hours).
>
> We're studying what to do to be able to plan things like that.

Well, I know customers... They always want something to be different in
the end.

>>I'm pretty sure VIEWs will be faster if you need to DROP/CREATE the
>>tables, as you'll have to move your data around not to lose it... Of
>>course, you'd need to do the same for your base tables, that contain
>>_all_ your data in the updatable view case...
>
> As I said, I was thinking along the lines of renaming tables to archive them
> when needed (weekly, monthly, quarterly, yearly) and inheriting from "base"
> schema again. This looks like a very fast operation...
>
> Do you believe it will be problematic?

What I believe it to be isn't really significant, neither of us has much
experience using table inheritence or updatable views for things like
these, it seems.

One thing that comes to mind now is the timetravel contrib package.
That's another thing I plan to look into some time, but it may suit your
needs.

--
Alban Hertroys
alban(at)magproductions(dot)nl

magproductions b.v.

T: ++31(0)534346874
F: ++31(0)534346876
M:
I: www.magproductions.nl
A: Postbus 416
7500 AK Enschede

// Integrate Your World //

In response to

Re: [SQL] (Ab)Using schemas and inheritance at 2006-05-24 15:21:52 from Jorge Godoy

Responses

Re: [SQL] (Ab)Using schemas and inheritance at 2006-05-24 17:10:44 from Jorge Godoy

Browse pgsql-general by date

	From	Date	Subject
Next Message	Rafal Pietrak	2006-05-24 16:31:56	Re: background triggers?
Previous Message	Jimmy Salazar	2006-05-24 15:51:37	PROBLEMAS CON POSTGRES EN MAQUINAS MULTIPROCESADORES

Browse pgsql-sql by date

	From	Date	Subject
Next Message	Jorge Godoy	2006-05-24 17:10:44	Re: [SQL] (Ab)Using schemas and inheritance
Previous Message	Jorge Godoy	2006-05-24 15:21:52	Re: [SQL] (Ab)Using schemas and inheritance