RE: FW: Optimisation of IN condition

From: "Mayers, Philip J" <p(dot)mayers(at)ic(dot)ac(dot)uk>
To: "'pgsql-general(at)postgresql(dot)org'" <pgsql-general(at)postgresql(dot)org>
Subject: RE: FW: Optimisation of IN condition
Date: 2001-03-08 12:17:53
Message-ID: A0F836836670D41183A800508BAF190B35E3F2@icex1.cc.ic.ac.uk
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-general

Ok, after some suggestions from a colleague, I've refactored the query to
use an outer join, like this:

hdb=> select host.ip as registeredip,arptable.ip as
realip,host.mac,arptable.router,arptable.interface from host,arptable where
host.mac = arptable.mac and host.ip = arptable.ip
hdb-> union
hdb-> select NULL as registeredip,arptable.ip as
realip,arptable.mac,arptable.router,arptable.interface from arptable
hdb-> order by router,interface,mac;

registeredip | realip | mac | router
| interface
-----------------+-----------------+-------------------+--------------------
------+-----------
| 192.168.4.39 | 00:10:5a:bd:79:2f |
a-routername.domain.xx | 21
| 192.168.4.238 | 00:10:5a:bd:79:e8 |
a-routername.domain.xx | 21
192.168.4.181 | 192.168.4.181 | 00:10:5a:bd:7b:4e |
a-routername.domain.xx | 21
| 192.168.4.181 | 00:10:5a:bd:7b:4e |
a-routername.domain.xx | 21
| 192.168.4.192 | 00:10:5a:bd:7d:35 |
a-routername.domain.xx | 21
| 192.168.4.239 | 00:10:5a:bd:82:6c |
a-routername.domain.xx | 21
192.168.4.171 | 192.168.4.171 | 00:10:5a:bd:84:6d |
a-routername.domain.xx | 21
| 192.168.4.171 | 00:10:5a:bd:84:6d |
a-routername.domain.xx | 21
| 192.168.4.212 | 00:10:5a:bd:84:97 |
a-routername.domain.xx | 21
| 192.168.4.194 | 00:10:5a:bd:84:a1 |
a-routername.domain.xx | 21
192.168.4.182 | 192.168.4.182 | 00:10:5a:bd:84:c2 |
a-routername.domain.xx | 21
| 192.168.4.182 | 00:10:5a:bd:84:c2 |
a-routername.domain.xx | 21

Cool - now I can identify unregistered machines using a NULL. But, I'm
getting repeat columns. Do I have to use CORRESPONDING BY (realip,mac),
which postgres doesn't support, or is there another way?

Regards,
Phil

+----------------------------------+
| Phil Mayers, Network Support |
| Centre for Computing Services |
| Imperial College |
+----------------------------------+

-----Original Message-----
From: Mayers, Philip J [mailto:p(dot)mayers(at)ic(dot)ac(dot)uk]
Sent: 08 March 2001 11:22
To: 'pgsql-general(at)postgresql(dot)org'
Subject: [GENERAL] FW: Optimisation of IN condition

I've got some tables:

create table interface (
machineid text,
mac macaddr,
primary key(mac)
);

create table arptable (
router text,
interface int2,
mac macaddr,
ip inet
);

They're big, 10k rows in interface, maybe 35k in arptable. I want to do
this:

hdb=> explain select * from arptable where mac not in (select mac from
interface);
NOTICE: QUERY PLAN:

Seq Scan on arptable (cost=0.00..407762.81 rows=4292 width=48)
SubPlan
-> Seq Scan on interface (cost=0.00..189.96 rows=8796 width=6)

But, of course, that a very expensive task. Now, it seems to me that, since
I have an index on mac in interface, I *should* in theory be able to speed
this up, in the following pseudo-code fashion:

foreach mac in arptable:
if lookup(mac,interface_pkey):
return *

Do you see what I'm getting at? Can I refashion the query somehow to take
advantage of that? The converse operation, finding registered machines:

hdb=> explain select interface.mac from arptable,interface where
interface.mac = arptable.mac;
NOTICE: QUERY PLAN:

Nested Loop (cost=0.00..8838.17 rows=4292 width=12)
-> Seq Scan on arptable (cost=0.00..97.92 rows=4292 width=6)
-> Index Scan using interface_pkey on interface (cost=0.00..2.02 rows=1
width=6)

Is, of course, speedy. How can I take advantage of that. EXCEPT doesn't seem
to help:

hdb=> explain select mac from arptable except select interface.mac from
arptable,interface where interface.mac = arptable.mac;
NOTICE: QUERY PLAN:

Seq Scan on arptable (cost=0.00..37933516.98 rows=4292 width=6)
SubPlan
-> Materialize (cost=8838.17..8838.17 rows=4292 width=12)
-> Nested Loop (cost=0.00..8838.17 rows=4292 width=12)
-> Seq Scan on arptable (cost=0.00..97.92 rows=4292
width=6)
-> Index Scan using interface_pkey on interface
(cost=0.00..2.02 rows=1 width=6)

Help!

Regards,
Phil

+----------------------------------+
| Phil Mayers, Network Support |
| Centre for Computing Services |
| Imperial College |
+----------------------------------+

---------------------------(end of broadcast)---------------------------
TIP 2: you can get off all lists at once with the unregister command
(send "unregister YourEmailAddressHere" to majordomo(at)postgresql(dot)org)

Responses

Browse pgsql-general by date

  From Date Subject
Next Message Mayers, Philip J 2001-03-08 12:27:36 RE: FW: Optimisation of IN condition
Previous Message Mayers, Philip J 2001-03-08 11:22:11 FW: Optimisation of IN condition