| From: | Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> |
|---|---|
| To: | Alfred Perlstein <bright(at)wintelcom(dot)net> |
| Cc: | pgsql-hackers(at)postgresql(dot)org |
| Subject: | Re: Unable to get acceptable performance from EXCEPT |
| Date: | 2000-05-10 22:38:49 |
| Message-ID: | 20127.957998329@sss.pgh.pa.us |
| Views: | Whole Thread | Raw Message | Download mbox | Resend email |
| Thread: | |
| Lists: | pgsql-hackers |
Alfred Perlstein <bright(at)wintelcom(dot)net> writes:
> =# select ref_id from ref_old except select ref_id from ref_new;
> Takes over 10 minutes, probably closer to half an hour.
> I've also tried using 'NOT IN ( select ref_id from ref_new )'
Yup. EXCEPT is effectively translated to a NOT IN, if I recall
correctly, and neither IN ( sub-select ) nor NOT IN ( sub-select )
are implemented very efficiently. Basically you get O(N^2) behavior
because the inner select is rescanned for each outer tuple.
We have a TODO list item to try to be smarter about this...
> Is there a way to formulate my SQL to get Postgresql to follow
> this algorithm [ kind of like a mergejoin ]
No, but you could try
select ref_id from ref_old where not exists
(select ref_id from ref_new where ref_id = ref_old.ref_id);
which would at least be smart enough to consider using an index
on ref_new(ref_id) instead of a sequential scan.
regards, tom lane
| From | Date | Subject | |
|---|---|---|---|
| Next Message | Vince Vielhaber | 2000-05-10 22:59:38 | Re: setproctitle() no longer used? |
| Previous Message | Alfred Perlstein | 2000-05-10 22:35:12 | Unable to get acceptable performance from EXCEPT |