Re: Re: Problem with duplicate rows when FULL OUTER JOIN'ing 3 derived tables

From: Andreas Joseph Krogh <andreas(at)visena(dot)com>
To: pgsql-sql(at)postgresql(dot)org
Subject: Re: Re: Problem with duplicate rows when FULL OUTER JOIN'ing 3 derived tables
Date: 2014-06-12 19:32:39
Message-ID: VisenaEmail.20c.1d6f9efc3aa09035.14691911e5e@tc7-on
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-sql

På torsdag 12. juni 2014 kl. 16:02:39, skrev David G Johnston <
david(dot)g(dot)johnston(at)gmail(dot)com <mailto:david(dot)g(dot)johnston(at)gmail(dot)com>>: Note - numbers
do not relate to your questions. This pretty much answers 1
and 3.

1. I'd ensure that project name/id on a table can never be null by creating
a dummy project that means "none assigned".
2. I would avoid full join.  To do so I'd cross join a distinct list of
companies with a distinct list of projects.
3. Against the join in 2 you then left join three times, once for each
product table.
4. In the final result any product not having a correspond company/project
would have its value coalesced to zero.

The end result is a table without any NULL and, in the case of your
explanatory data, 9 rows - two of which would be all zeros: (c3,p1) and
(c3,p,2)

If you cannot change the raw data I'd use CTE/WITH to normalize the data
according to 1 and the use these CTEs in the rest of the query.  I would
also do 2 is a CTE then 3 would be normal subqueries or relation references
as necessary.

David J.   Hi and thanks for input.   I cannot change the data so I have to
deal with NULLs.   Not all companies or projects are in the result, only the
ones involved having fruits, by combining the derived tables. I therefore don't
see how I effectively can build up a list of distinct companies and projects to
join with.   I have a simplified version (full schema below) here which shows
that there's an error in the query (when one extra derived table is added to
the query):   SELECT
    coalesce(apl.company_name, bns.company_name, pns.company_name,
prs.company_name) AS company_name,
    coalesce(apl.project_name, bns.project_name, pns.project_name,
prs.project_name) AS project_name,
    apl.qty                                                        AS
num_apples,
    bns.qty                                                        AS
num_bananas,
    pns.qty                                                        AS
num_pineapples,
    prs.qty                                                        AS
num_pears FROM       company_apples AS apl       FULL OUTER JOIN
company_bananas AS bns
        ON apl.company_id = bns.company_id
           AND (apl.project_id IS NULL AND bns.project_id IS NULL OR
apl.project_id = bns.project_id)       FULL OUTER JOIN company_pineapples AS pns
            ON apl.company_id = pns.company_id
               AND (apl.project_id IS NULL AND pns.project_id IS NULL OR
apl.project_id = pns.project_id)       FULL OUTER JOIN company_pears AS prs
            ON apl.company_id = prs.company_id
               AND (apl.project_id IS NULL AND prs.project_id IS NULL OR
apl.project_id = prs.project_id)   ORDER BY company_name ASC, project_name ASC
NULLS FIRST;     This gives the result:   # company_name project_name num_apples
num_bananas num_pineapples num_pears 1 C1 NULL 2 2 10 NULL 2 C1 P1 5 12 NULL
NULL 3 C1 P2 2 NULL NULL NULL 4 C2 NULL 3 NULL 10 NULL 5 C2 P1 3 NULL NULL NULL
6 C2 P2 3 NULL NULL NULL 7 C3 NULL NULL 8 NULL NULL 8 C3 NULL NULL NULL NULL 7
As you see, there are two rows for C3, which should have been 1, with
num_bananas=8 and num_pears=7. There has to be something wrong with my FULL
OUTER JOINs but I don't know what....   Anyone knows how to write the correct
query for this?   Thanks.     Full schame and example-data:   drop table if
exists company_apples;
drop table if exists company_bananas;
drop table if exists company_pineapples; CREATE TABLE company_apples
(
    company_id INT,
    company_name VARCHAR,
    project_id INT,
    project_name VARCHAR,
    qty BIGINT
); CREATE TABLE company_bananas
(
    company_id INT,
    company_name VARCHAR,
    project_id INT,
    project_name VARCHAR,
    qty BIGINT
); CREATE TABLE company_pineapples
(
    company_id INT,
    company_name VARCHAR,
    project_id INT,
    project_name VARCHAR,
    qty BIGINT
); CREATE TABLE company_pears
(
    company_id INT,
    company_name VARCHAR,
    project_id INT,
    project_name VARCHAR,
    qty BIGINT
);
INSERT INTO company_apples (company_id, company_name, project_id,
project_name, qty) VALUES (1, 'C1', null, null, 2);
INSERT INTO company_apples (company_id, company_name, project_id,
project_name, qty) VALUES (1, 'C1', 1, 'P1', 5);
INSERT INTO company_apples (company_id, company_name, project_id,
project_name, qty) VALUES (1, 'C1', 2, 'P2', 2);
INSERT INTO company_apples (company_id, company_name, project_id,
project_name, qty) VALUES (2, 'C2', null, null, 3);
INSERT INTO company_apples (company_id, company_name, project_id,
project_name, qty) VALUES (2, 'C2', 1, 'P1', 3);
INSERT INTO company_apples (company_id, company_name, project_id,
project_name, qty) VALUES (2, 'C2', 2, 'P2', 3); INSERT INTO company_bananas
(company_id, company_name, project_id, project_name, qty) VALUES (1, 'C1',
null, null, 2);
INSERT INTO company_bananas (company_id, company_name, project_id,
project_name, qty) VALUES (1, 'C1', 1, 'P1', 12);
INSERT INTO company_bananas (company_id, company_name, project_id,
project_name, qty) VALUES (3, 'C3', null, null, 8); INSERT INTO
company_pineapples (company_id, company_name, project_id, project_name, qty)
VALUES (1, 'C1', null, null, 10);
INSERT INTO company_pineapples (company_id, company_name, project_id,
project_name, qty) VALUES (2, 'C2', null, null, 10); INSERT INTO company_pears
(company_id, company_name, project_id, project_name, qty) VALUES (3, 'C3',
null, null, 7);   select company_name, project_name, qty from company_apples
order by company_name ASC, project_name ASC nulls first;
select company_name, project_name, qty from company_bananas order by
company_name ASC, project_name ASC nulls first;
select company_name, project_name, qty from company_pineapples order by
company_name ASC, project_name ASC nulls first;
select company_name, project_name, qty from company_pears order by
company_name ASC, project_name ASC nulls first; -- Andreas Jospeh Krogh CTO /
Partner - Visena AS Mobile: +47 909 56 963 andreas(at)visena(dot)com
<mailto:andreas(at)visena(dot)com> www.visena.com <https://www.visena.com>
<https://www.visena.com>  

In response to

Responses

Browse pgsql-sql by date

  From Date Subject
Next Message David G Johnston 2014-06-12 19:49:59 Re: Problem with duplicate rows when FULL OUTER JOIN'ing 3 derived tables
Previous Message David G Johnston 2014-06-12 14:02:39 Re: Problem with duplicate rows when FULL OUTER JOIN'ing 3 derived tables