From: | Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> |
---|---|
To: | Victor Ciurus <vikcious(at)gmail(dot)com> |
Cc: | pgsql-performance(at)postgresql(dot)org |
Subject: | Re: Simple machine-killing query! |
Date: | 2004-10-21 15:41:48 |
Message-ID: | 16017.1098373308@sss.pgh.pa.us |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-performance |
Victor Ciurus <vikcious(at)gmail(dot)com> writes:
> What I am requested to do is to keep all records from 'BIGMA' that do
> not apear in 'DIRTY'
> So far I have tried solving this by going for:
> [explain] select * from BIGMA where string not in (select * from DIRTY);
> QUERY PLAN
> ------------------------------------------------------------------------
> Seq Scan on bigma (cost=0.00..24582291.25 rows=500 width=145)
> Filter: (NOT (subplan))
> SubPlan
> -> Seq Scan on dirty (cost=0.00..42904.63 rows=2503963 width=82)
> (4 rows)
If you are using PG 7.4, you can get reasonable performance out of this
approach, but you need to jack sort_mem up to the point where the whole
DIRTY table will fit into sort_mem (so that you get a hashed-subplan
plan and not a plain subplan). If you find yourself setting sort_mem to
more than say half of your machine's available RAM, you should probably
forget that idea.
> [explain] select * from bigma,dirty where bigma.email!=dirty.email;
This of course does not give the right answer at all.
A trick that people sometimes use is an outer join:
select * from bigma left join dirty on (bigma.email=dirty.email)
where dirty.email is null;
Understanding why this works is left as an exercise for the reader
... but it does work, and pretty well too. If you're using pre-7.4
PG then this is about the only effective solution AFAIR.
regards, tom lane
From | Date | Subject | |
---|---|---|---|
Next Message | Steinar H. Gunderson | 2004-10-21 15:49:23 | Re: Anything to be gained from a 'Postgres Filesystem'? |
Previous Message | Aaron Werman | 2004-10-21 15:14:14 | Re: Simple machine-killing query! |