Re: is (not) distinct from

From: Adrian Klaver <adrian(dot)klaver(at)aklaver(dot)com>
To: Johann Spies <johann(dot)spies(at)gmail(dot)com>, "pgsql-general(at)postgresql(dot)org" <pgsql-general(at)postgresql(dot)org>
Subject: Re: is (not) distinct from
Date: 2017-03-07 15:04:25
Message-ID: ccbfafd3-2f26-8f11-4a0b-ed7d4723c29e@aklaver.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-general

On 03/07/2017 01:22 AM, Johann Spies wrote:
> Thanks (again Adrian) and Tom.
>
> The situation was that I had a table with 731million records which I
> wanted to copy into a partitioned one where there was a unique
> constraint on the fields used in my query.
>
> The "backup" table was the single one.
>
> While inserting into the partitioned table from the backup one, several
> (about 120000) records failed to insert.
>
> I wanted to find out which records were involved and found that some had
> "'" characters in the values which broke some of the functions used to
> do some calculations..
>
> As there were fields that might have null values I have tried the "is
> not distinct from".
>
> Both sides of the query had primary keys and I did not use group by.
> That was why I used "distinct".

Would it not be easier to use a LEFT JOIN between the original
table(backup) and the new table:

https://www.postgresql.org/docs/9.6/static/sql-select.html

join_type

"LEFT OUTER JOIN returns all rows in the qualified Cartesian product
(i.e., all combined rows that pass its join condition), plus one copy of
each row in the left-hand table for which there was no right-hand row
that passed the join condition. This left-hand row is extended to the
full width of the joined table by inserting null values for the
right-hand columns. Note that only the JOIN clause's own condition is
considered while deciding which rows have matches. Outer conditions are
applied afterwards."

So something like:

SELECT
a.pk
FROM
original_table AS a
LEFT JOIN --The OUTER is not required
new_table AS b
ON
a.pk = b.pk
WHERE
b.pk IS NULL

That would show all the rows in the original table that where not
transferred over.

>
> Anyhow in the end, I made some progress with a modified query:
>
> where
> s.citing_article = A.citing_article
> and
> s.cited_article != A.cited_article
> and
> s.pubyear is distinct from A.pubyear
> and
> s.year_cited is distinct from A.year_cited
> and
> s.cited_author is distinct from A.cited_author
> and
> regexp_replace(s.cited_title, $$'$$, $$''$$,'g') is distinct from
> regexp_replace(A.cited_title, $$'$$, $$''$$,'g')
> and
> regexp_replace(s.cited_work, $$'$$, $$''$$,'g') is distinct
> from regexp_replace(A.cited_work, $$'$$, $$''$$,'g')
> and
> s.doi is distinct from A.doi
>
> Regards.
> Johann
>
> --
> Because experiencing your loyal love is better than life itself,
> my lips will praise you. (Psalm 63:3)

--
Adrian Klaver
adrian(dot)klaver(at)aklaver(dot)com

In response to

Browse pgsql-general by date

  From Date Subject
Next Message Caleb Cushing 2017-03-07 20:04:40 Re: Feature request - psql --quote-variable
Previous Message Stephen Frost 2017-03-07 14:32:45 Re: intentional or oversight? pg_dump -c does not restore default priviliges on schema public