query takes 65 times longer if I add 1 column (explain attached)

From: "Dan Weber" <weberdan(at)gmail(dot)com>
To: pgsql-general(at)postgresql(dot)org
Subject: query takes 65 times longer if I add 1 column (explain attached)
Date: 2006-11-02 17:12:25
Message-ID: 5dfa06e10611020912m1ff63bc3j516b7488914978cb@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-general

I've got a query that takes a very long time to execute if I add an
additional column to the SELECT clause.

Here is Query A, the slow one. I've numbered the rows
of the EXPLAIN for convenience.

psql=# explain analyze SELECT timestamp, src_ip, dst_ip, id,
string_one, string_two FROM table_one LEFT JOIN table_two USING(id)
WHERE TRUE AND src_ip = '192.168.10.128' ORDER BY timestamp DESC
LIMIT 10;

QUERY PLAN

-------------------------------------------------------------------

a1. Limit (cost=18954.93..18954.95 rows=10 width=136)
(actual time=169038.823..169038.851 rows=10 loops=1)

a2. -> Sort (cost=18954.93..18959.75 rows=1929 width=136)
(actual time=169038.816..169038.825 rows=10 loops=1)

a3. Sort Key: table_one."timestamp"

a4. -> Nested Loop Left Join
(cost=0.00..18849.66 rows=1929 width=136)
(actual time=862.661..169036.636 rows=179 loops=1)

a5. Join Filter: ("outer".id = "inner".id)

a6. -> Index Scan using table_one_type_key on
table_one (cost=0.00..17.70 rows=3 width=72)
(actual time=0.083..864.386 rows=179 loops=1)

a7. Index Cond: (src_ip = '192.168.10.128'::inet)

a8. -> Seq Scan on table_two (cost=0.00..4669.92
rows=128592 width=68)
(actual time=842.166..933.496 rows=4202 loops=179)

Total runtime: 169039.368 ms
(9 rows)

Time: 169041.495 ms

It takes almost 3 minutes to run. If I knock off either string_one or
string_two, the query runs in <3 seconds. Here is Query B:

psql=# explain analyze SELECT timestamp, src_ip, dst_ip, id,
string_one FROM table_one LEFT JOIN table_two USING(id) WHERE TRUE
AND src_ip = '192.168.10.128' ORDER BY timestamp DESC LIMIT 10;

QUERY PLAN

-------------------------------------------------------------------

b1. Limit (cost=17617.45..17617.47 rows=10 width=104
(actual time=2644.975..2645.003 rows=10 loops=1)

b2. -> Sort (cost=17617.45..17622.27 rows=1929 width=104)
(actual time=2644.968..2644.978 rows=10 loops=1)

b3. Sort Key: table_one."timestamp"

b4. -> Nested Loop Left Join (cost=5810.75..17512.19
rows=1929 width=104)
(actual time=956.129..2643.458 rows=179 loops=1)

b5. Join Filter: ("outer".id = "inner".id)

b6. -> Bitmap Heap Scan on table_one
(cost=7.24..13.72 rows=3 width=72)
(actual time=39.745..67.818 rows=179 loops=1)

b7. Recheck Cond: (src_ip = '192.168.10.128'::inet)

b8. -> Bitmap Index Scan on table_one_type_key
(cost=0.00..7.24 rows=3 width=0)
(actual time=29.399..29.399 rows=179 loops=1)

b9. Index Cond: (src_ip = '192.168.10.128'::inet)

b10. -> Materialize (cost=5803.51..8094.43
rows=128592 width=36)
(actual time=4.573..9.462 rows=4202 loops=179)

b11. -> Seq Scan on table_two
(cost=0.00..4669.92 rows=128592 width=36)
(actual time=818.216..895.127
rows=4202 loops=1)

Total runtime: 2646.899 ms
(12 rows)

If we compare each EXPLAIN, they're about the same down through line
5, the Join Filter. But Query A joins things from an "Index Scan" and
a "Seq Scan," while Query B joins things from a "Bitmap Index Scan"
and a "Materialize."

1. Why is the additional column causing such a big change in the
searching technique, given that the column is not involved in
the WHERE at all?

2. Besides indices, how can I coerce Postgres to act like Query
B instead of like Query A?

Responses

Browse pgsql-general by date

  From Date Subject
Next Message Anton P. Linevich 2006-11-02 17:16:24 Postgres-R
Previous Message brian 2006-11-02 16:56:43 Re: Weird double single quote issue