Re: Finding orphan records

From: Wes <wespvp(at)syntegra(dot)com>
To: Greg Stark <gsstark(at)mit(dot)edu>
Cc: Jonel Rienton <jonel(at)rientongroup(dot)com>, 'Postgresql-General' <pgsql-general(at)postgresql(dot)org>
Subject: Re: Finding orphan records
Date: 2006-01-12 17:15:13
Message-ID: BFEBEAC1.1C5BC%wespvp@syntegra.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-general

On 1/12/06 10:37 AM, "Greg Stark" <gsstark(at)mit(dot)edu> wrote:

> If it's only a factor of 3-4 then the merge join should be faster. If it's
> really two orders of magnitude (100x?) then the nested loop below would be
> faster. I think in 8.1 (and I think in 8.0 too) the planner is capable of
> coming up with both plans for the NOT IN query though.

Yep, it's really two orders of magnitude. A is about 4 million records. B
is around 200 million, and C is about 800 million records. That's why I
figured a sequential pass on A with index lookups on B and C would be
fastest. It took about 3.25 hours to complete the SELECT.

I'm working on the 8.1 upgrade now. My biggest concern is making sure no
bugs have crept in to ecpg that would affect us (bitten once before, so
being careful). So far everything else is looking good with 8.1.

I'm pondering dumping the keys for A to a file, and B and C to another file,
sorting with uniqueness on the B/C file, then programmatically determining
which keys have been freed. I'm pretty sure this will be much faster, but I
had hoped to avoid an external process like this.

Wes

In response to

Browse pgsql-general by date

  From Date Subject
Next Message Amédée 2006-01-12 17:15:23 Locales problems with debian sarge3.1
Previous Message codeWarrior 2006-01-12 16:55:19 Re: Returning SQL statement