| From: | Дмитрий Васильев <dmitry(dot)vasil(at)gmail(dot)com> | 
|---|---|
| To: | pgsql-performance(at)postgresql(dot)org | 
| Subject: | Bad query plan | 
| Date: | 2011-07-24 14:06:40 | 
| Message-ID: | CAKNQRbz8-XYcSmoK8u3S0BvVVk3OdXve7NG20eNwB5p5xms8MQ@mail.gmail.com | 
| Views: | Whole Thread | Raw Message | Download mbox | Resend email | 
| Thread: | |
| Lists: | pgsql-performance | 
I have a problem with poor query plan.
My PostgreSQL is "PostgreSQL 8.4.8, compiled by Visual C++ build 1400,
32-bit" installed by EnterpriseDB installer on Windows 7 32 bit.
Steps to reproduce:
Start with fresh installation and execute the following:
drop table if exists small;
drop table if exists large;
CREATE TABLE small
(
  id bigint,
  primary key(id)
);
CREATE TABLE large
(
  id bigint,
  primary key(id)
);
--Insert 100000 rows into large
CREATE or replace FUNCTION populate_large() RETURNS bigint AS $$
DECLARE
    id1 bigint := 0;
BEGIN
    LOOP
        insert into large(id) values(id1);
        id1 := id1 +1;
        if id1>100000 then
            exit;
        end if;
    END LOOP;
    return id1;
END
$$ LANGUAGE plpgsql;
--Insert 1000 rows into small
CREATE or replace FUNCTION populate_small() RETURNS bigint AS $$
DECLARE
    id1 bigint := 0;
BEGIN
    LOOP
        insert into small(id) values(id1);
        id1 := id1 +1;
        if id1>1000 then
            exit;
        end if;
    END LOOP;
    return id1;
END
$$ LANGUAGE plpgsql;
select populate_large(),populate_small();
analyze;
Then execute
explain analyze insert into large(id) select id from small where id
not in(select id from large);
It gives
"Seq Scan on small  (cost=1934.01..823278.28 rows=500 width=8) (actual
time=6263.588..6263.588 rows=0 loops=1)"
"  Filter: (NOT (SubPlan 1))"
"  SubPlan 1"
"    ->  Materialize  (cost=1934.01..3325.02 rows=100001 width=8)
(actual time=0.007..3.012 rows=501 loops=1001)"
"          ->  Seq Scan on large  (cost=0.00..1443.01 rows=100001
width=8) (actual time=0.010..5.810 rows=1001 loops=1)"
"Total runtime: 6263.703 ms"
But
explain analyze insert into large(id) select id from small where not
exists (select id from large l where small.id=l.id);
exeutes much faster:
"Merge Anti Join  (cost=0.00..85.58 rows=1 width=8) (actual
time=15.793..15.793 rows=0 loops=1)"
"  Merge Cond: (small.id = l.id)"
"  ->  Index Scan using small_pkey on small  (cost=0.00..43.27
rows=1001 width=8) (actual time=0.025..3.515 rows=1001 loops=1)"
"  ->  Index Scan using large_pkey on large l  (cost=0.00..3050.28
rows=100001 width=8) (actual time=0.017..2.932 rows=1001 loops=1)"
"Total runtime: 15.863 ms"
Both queries are semantically the same.
| From | Date | Subject | |
|---|---|---|---|
| Next Message | Tom Lane | 2011-07-24 16:02:24 | Re: Bad query plan | 
| Previous Message | alan | 2011-07-23 16:23:48 | insert |