Quick Links

Seq scan on 10million record table.. why?

From:	Vincenzo Melandri <vmelandri(at)imolinfo(dot)it>
To:	pgsql-performance(at)postgresql(dot)org
Subject:	Seq scan on 10million record table.. why?
Date:	2012-10-30 12:15:10
Message-ID:	CAHSd9GfstpNZHaW+o34S2joPUL03QYoH8s_dMrhMm-f0B2REVw@mail.gmail.com
Views:	Whole Thread \| Raw Message \| Download mbox \| Resend email
Thread:
Lists:	pgsql-performance

Hi all

I have a problem with a data import procedure that involve the following query:

select a,b,c,d
from big_table b
join data_sequences_table ds
on b.key1 = ds.key1 and b.key2 = ds.key2
where ds.import_id=xxxxxxxxxx

The "big table" has something like 10.000.000 records ore more
(depending on the table, there are more than one of them).
The data are uploaded in 20k record blocks, and the keys are written
on "data_sequences_table". The keys are composite (key1,key2), and
every 5-10 sequences (depending on the size of the upload) the
data_sequences_table records are deleted.
I have indexes on both the key on the big table and the import_id on
the sequence table.

the query plan evualuate like this:

Merge Join (cost=2604203.98..2774528.51 rows=129904 width=20)
Merge Cond: (((( big_table.key1)::numeric) =
data_sequences_table.key1) AND ((( big_table.key2)::numeric) =
data_sequences_table.key2))
-> Sort (cost=2602495.47..2635975.81 rows=13392135 width=20)
Sort Key: ((big_table.key1)::numeric), ((big_table.key2)::numeric)
-> Seq Scan on big_table (cost=0.00..467919.35 rows=13392135 width=20)
-> Sort (cost=1708.51..1709.48 rows=388 width=32)
Sort Key: data_sequences_table.key1, data_sequences_table.key2
-> Seq Scan on data_sequences_table (cost=0.00..1691.83
rows=388 width=32)
Filter: (import_id = 1351592072::numeric)

It executes in something like 80 seconds. The import procedure has
more than 500 occurrences of this situation. :(
Why is the big table evaluated with a seq scan? The result is 0 to
20.000 records (the query returns the records that already exists and
should be updated, not inserted).. Can I do something to speed this
up?

--
Vincenzo.
Imola Informatica

Ai sensi del D.Lgs. 196/2003 si precisa che le informazioni contenute
in questo messaggio sono riservate ed a uso esclusivo del
destinatario.
Pursuant to Legislative Decree No. 196/2003, you are hereby informed
that this message contains confidential information intended only for
the use of the addressee.

Responses

Re: Seq scan on 10million record table.. why? at 2012-10-30 13:45:11 from Gabriele Bartolini
Re: Seq scan on 10million record table.. why? at 2012-10-30 13:53:00 from Shaun Thomas
Re: Seq scan on 10million record table.. why? at 2012-10-30 14:03:54 from Віталій Тимчишин

Browse pgsql-performance by date

	From	Date	Subject
Next Message	Cesar Martin	2012-10-30 12:54:23	High %SYS CPU usage
Previous Message	Kevin Grittner	2012-10-30 11:55:54	Re: How to keep queries low latency as concurrency increases