Unions, schemas, and design questions...

From: "Net Virtual Mailing Lists" <mailinglists(at)net-virtual(dot)com>
To: "Pgsql General" <pgsql-general(at)postgresql(dot)org>
Subject: Unions, schemas, and design questions...
Date: 2004-11-22 02:38:36
Message-ID: 20041122023836.9910@mail.net-virtual.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-general

I've been spending the last few days converting many databases into a
single schema and have completed the process, but now I'm at somewhat of
an impasse as to the best way to proceed forward....

It is important for me to explain that each of these databases has a
rather different structure, going forward I'm using more of an
inheritance model for each new schema, but that was simply was not
possible back in the day and I hope one day to make the switch completely
but it is just not possible to complete in time for this next thing I
need to get done. So with that in mind, let me explain.

Each of these schemas has two tables (a "users" table and a "proposals"
table - there are actually other tables, but I think this is sufficient
for this discussion) but the structure of these two tables is very
different between each schema. For each schema there is also a class
developed in some language which defines a set of functions necessary to
manipulate records in each of these tables (each class defines a basic
core class of methods which can be called). Tied into all of this is a
user interface which allows users to search through the data, logging in,
etc and at this point I have finally gotten the database to a performance
level that I am very happy with and I am concerned about the implication
of what I now have to implement. It is also probably important to note
that each of these tables, within each schema, has a private sequence for
each of these tables and I guess the only way to resolve this is to use a
single sequence for all the tables, but for some reason that just doesn't
sit right with me because it seems to sort of make all these schemas
dependent on each other.

Now I have the need to add a rather large repository of data which needs
to be accessed by each of these schemas (lets say 300,000 rows). The
concept is that for each schema I need to be able to tell it which
selection of records from this "global pool" it should query. The best
way I can think of doing this is with some sort of UNION query, first
querying the schema table and then doing a union on the global data and
as part of that query doing what is necessary to massage the data into
the schemas format. I might point out that "massaging the data" will in
and of itself be a rather complex task because it essentially would
involve almost an on-the-fly data conversion, for things like which
category the proposal is in (since each schema defines these
differently), etc - but I don't want to think too much about those
specifics right now.... The data file which feeds this "global pool"
gets updated on a daily basis and the thought of pre-processing the data
and inserting appropriate records into each schema is not very appealing
from both a disk-space issue and the time it would take to process the
file for 50+ different schemas (which is what this will likely grow to).

Oh and to add to it, it needs to be possible for each schema to
essentially make a copy of the global data into its local space where it
can be modified by a user at which point the local version of that record
needs to override the global one.

... I am not asking for a solution to all of this, just some thoughts as
to possible strategies one might use to cope with this sort of thing and
retain the performance.....

Thanks!

- Greg

Browse pgsql-general by date

  From Date Subject
Next Message Michael Glaesemann 2004-11-22 02:42:29 Re: timestamp with time zone question...
Previous Message P. George 2004-11-22 02:30:56 Re: timestamp with time zone question...