Re: NOT IN doesn't use index? (fwd)

From: Joe Conway <mail(at)joeconway(dot)com>
To: Becky Neville <rebecca(dot)neville(at)yale(dot)edu>
Cc: pgsql-performance(at)postgresql(dot)org
Subject: Re: NOT IN doesn't use index? (fwd)
Date: 2003-05-03 20:24:38
Message-ID: 3EB42586.6080301@joeconway.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-performance

Becky Neville wrote:
> I think that list is actually (gulp) hard coded. It's not my query.
> I am trying to speed it up for someone else - to hopefully learn something
> in the process that isn't dependent on what version of postgres i'm
> running :)
>
> I assume it's from another table but can't find it on their data
> model at the moment. Those are all valid billing codes. The query
> is checking to see if anyone was billed under an invalid code. So if
> everything is ok, the query returns nothing.

Yeah -- that sounds like there has to be a table of valid codes
somewhere. In that case you can substitute the "valid_codes" table in
the left join where I had the subselect with all the UNIONs.
Alternatively you might find a NOT EXISTS method would work best. If
there isn't a "valid_codes" table, but that hard coded list is static,
perhaps you could build one and use that.

> But there must be more to it than that...otherwise, they could just
> add a Valid flag to the lookup table.

Well I certainly wouldn't query a whole table of historical information
over and over. Can you use and date column (suitably indexed) to just
check recent transactions (like since the last time you checked)?

> If you have any ideas for speeding it up other than using another
> table please let me know. It only takes me 9 min to run with 2 mil
> rows but it takes them 7 hours (51 mil rows in Oracle with many other
> jobs running and poor system maintenance.)

As above, are all 51 million rows recent transactions, or is that all of
eternity? If its the latter, I'd scan the whole thing once and produce a
report, or maybe a "transactions_with_invalid_codes" table.

From that point on, I'd only check the transactions since the last time
I'd checked, either based on a timestamp or even a sequence generated id
field. All you need to do is save off the max value each time you run,
and then use that as the starting point next time.

HTH,

Joe

In response to

Browse pgsql-performance by date

  From Date Subject
Next Message brew 2003-05-03 20:30:38 Re: why is the db so slow?
Previous Message Becky Neville 2003-05-03 19:56:53 Re: NOT IN doesn't use index? (fwd)