From: | "Paulo Sergio Fernandes Parola (Keyword)" <PParola(at)vesper(dot)com(dot)br> |
---|---|
To: | pgsql-general(at)postgresql(dot)org |
Subject: | JOIN performance + exporting/importing text files (how to do bett er than with \copy?) |
Date: | 2002-06-12 15:49:22 |
Message-ID: | 743CB750A13DD511A5930001023AA34106328407@ipcsrvmsr10.sp.vespersa.com.br |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-general |
Hi,
I am running PostgreSQL over RedHat. I have two tables with no primary key
(not well behaved data), so that I create an index on both tables on the
fields I want to do an outer join.
With PostgreSQL I can deal very well with the large amounts of data I am
working with (GigaBytes of data with tables with millions of registers).
My problems are basically:
1) performance while executing the queries upon these not well behaved data
('wrong' datatypes, non normalized tables, duplicated data which *has* to be
imported, duplicates which sometimes *cannot be eliminated* by some business
rules thus avoiding the correct setup of primary/unique keys, etc...)
2) The time necessary to build these queries by hand (I have to import from
1 up to 'n' files into some tables, then I perform some transforms into the
columns, adjusting data types or format, and at last I perform the joins I
need to generate the desired output tables).
3) Lack of flexibility to export/import data just by using command \COPY.
My questions:
1) I create a third table as the result of the join between the two tables,
but this process takes too long (one table has aprox. 20,000 records and the
other one aprox. 600,000 records) - I have not waited for it to finish, but
it was already a couple of hours. I performed some manipulation I had to do
in the tables inside PostgreSQL and then I exported the data to text
delimited files an imported into MS-Access. By performing a query by example
I rebuilt the exact join between the tables and..... the results are
generated very quickly. Why such joins in Access generate results almost
instantly and take so long in PostgreSQL?
2) It would be much nicer if I had instead of static tables as the result of
the joins (thus wasting lots of HD space), it would be preferrable to create
the join as a view (as a bonus I can always know exactly how the join was
performed by looking at the definition of the view). But then every time I
try to select some data from the view the query will take endless time to
finish as well (as opposed to what is experienced with Access).
3) Do you know if any of the existing front-ends (for either Linux or NT,
such as PgAccess or PgAdmin II) would allow me to perform my *OUTER JOINS*
in a more visual or easy way?
4) Do you know of any tool which will allow me to substitute \COPY ? E.g.
some more powerful 'loader' as there is for Oracle (with which I could do
already some data formatting for example)? As an example I need to be able
to *import* not only text delimited files, but also 'fixed length' files
(each line of the file has the same length, with every field occupying its
precise size). Unfortunately this is the format I receive the files, so that
I had to take advantage of a little Perl script to insert the delimiters in
the files, and then use \COPY. This is unfortunately a time consuming and
error-prone procedure (each file will have different configuration of field
lengths). I also would like to be able to *export* files in this format.
TIA,
Paulo
From | Date | Subject | |
---|---|---|---|
Next Message | Tom Lane | 2002-06-12 15:58:32 | Re: Help understanding indexes |
Previous Message | Jorge Sarmiento | 2002-06-12 15:43:25 | optimizing SELECT with LIKE |