Re: joining tables

From: Patrick Headley <pheadley(at)linxco-inc(dot)com>
To: pgadmin-support(at)lists(dot)postgresql(dot)org
Subject: Re: joining tables
Date: 2019-09-03 18:47:14
Message-ID: 419aab2d-186b-730f-4440-ebb8fbb1ce50@linxco-inc.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgadmin-support

Not having unique IDs from the source tables will present a problem in
that you could have more than one record in one of the tables with the
same data.

It looks like you will need to determine the fields that make each row
unique but common in all three tables. Then, you can match up records
from the three tables.

The next issue is if there is similar data in similar columns, which
data takes prescience. That will be something you have to decide. I
think the best way to do that is to import the most accurate table
first. If a record from the second table matches a record from the first
table, only import column values that weren't already populated by the
first import. Then do the same thing with the 3rd table and so on.

I think the overall process will be to import each CSV file into a
Postgres work table of it's own. Then, import the first table into a
master table that contains every field. Do the same with the second
import with the additional logic to only set a column value if it wasn't
already set by the first (prior) import. Then, do the same with the
third table and so on. By getting the CSV files into Postgres first, it
will be easier to manage cases where a column between the two tables is
the same data but the names are different. I think it will also perform
better.

The import of the first table into the main table will be a simple
INSERT query. For each of the rest of the tables I think I would first
run an UPDATE query with an INNER JOIN that updates column values of
matching records but only if the column isn't already populated. Then,
run an INSERT query using the second table on the left side of a LEFT
OUTER JOIN and the main table on the right side and only return records
from the left hand table that don't match any records on the right hand
table (i.e. fields in right table are null). Do the same for the rest of
the tables.

A different way to perform the INSERT query would be to use a sub query
in the WHERE clause of the second table that only returns records that
don't have matches in the main table. This will be easier if each of the
source tables has it's own unique ID across all source tables. If you
don't have unique IDs, you can use a concatenation of key field values.

SELECT columns
FROM table2
WHERE [concatenation of ] key_column(s) in table2 NOT IN (SELECT
[contenation of] key_column(s) FROM table1);

*/Patrick Headley/*
Linx Consulting, Inc.
(303) 916-5522
pheadley(at)linxco-inc(dot)com
www.linxco-inc.com

On 9/3/19 5:01 AM, TedJones wrote:
> Hi Patrick
>
> Thanks for your response. The tables that I gave were just an example and
> there is no relevance in the friend/yes or names/contact details in
> different tables.
>
> To clarity, another example would be three tables - shop1, shop2 and shop3.
> Each would have products so that would be a common column but product price
> may also be a common column. However, there would be also different column
> names for each shop. (columns with same names would be of the same data
> type).
>
> I have no control over the three tables that I get as csv files. It should
> be easy enough, if needed to add a primary key column (to each?) after
> reading in the csv files into tables.
>
> So, generally, what I am trying to achieve is to combine three tables into
> one table where some of the columns are the same and some are not. e.g table
> 1 with 12 columns; table 2 with 10 columns where 4 are the same as in table
> 1; table 3 with 4 columns where 1 is the same as in table 1 and 1 the same
> as in table 2. The number of columns in the result table would then be 12
> (table 1) + 6 (new ones from table 2) + 2 (new ones from table 3) = 20
> columns.
>
> Usually the data rows from each table will be different but not always. If
> data in a column common to all three tables e.g product name, was the same
> from all tables i.e. same product name then all the information about that
> product from the three tables would be in all the columns for that data row.
> I hope that’s clear. i.e. the difference between example1 and example2 in my
> original question.
>
> In this was in reverse it would start with the large result table and use
> three SELECT statements to create the three tables.
>
> Thanks
> Ted Jones
>
>
>
>
>
> --
> Sent from: https://www.postgresql-archive.org/PostgreSQL-pgadmin-support-f2191615.html
>
>

In response to

Browse pgadmin-support by date

  From Date Subject
Next Message Jack Royal-Gordon 2019-09-03 22:44:12 Re: joining tables
Previous Message Edson Richter 2019-09-03 17:12:01 Re: pgAdmin - migration