multiple joins + Order by + LIMIT query performance issue

From: Antoine Baudoux <ab(at)taktik(dot)be>
To: pgsql-performance(at)postgresql(dot)org
Subject: multiple joins + Order by + LIMIT query performance issue
Date: 2008-05-06 15:03:44
Message-ID: 938EC908-FB9A-4BFA-950A-053BE3C87085@taktik.be
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-performance

Hello,

I have a query that runs for hours when joining 4 tables but takes
milliseconds when joining one MORE table to the query.
I have One big table, t_event (8 million rows) and 4 small tables
(t_network,t_system,t_service, t_interface, all < 1000 rows). This
query takes a few milliseconds :
[code]
select * from t_Event event
inner join t_Service service on event.service_id=service.id
inner join t_System system on service.system_id=system.id
inner join t_Interface interface on system.id=interface.system_id
inner join t_Network network on interface.network_id=network.id
where (network.customer_id=1) order by event.c_date desc limit 25

"Limit (cost=23981.18..23981.18 rows=1 width=977)"
" -> Sort (cost=23981.18..23981.18 rows=1 width=977)"
" Sort Key: this_.c_date"
" -> Nested Loop (cost=0.00..23981.17 rows=1 width=977)"
" -> Nested Loop (cost=0.00..23974.89 rows=1 width=961)"
" -> Nested Loop (cost=0.00..191.42 rows=1
width=616)"
" Join Filter: (service_s3_.system_id =
service1_.system_id)"
" -> Nested Loop (cost=0.00..9.29 rows=1
width=576)"
" -> Seq Scan on t_network
service_s4_ (cost=0.00..1.01 rows=1 width=18)"
" Filter: (customer_id = 1)"
" -> Index Scan using
interface_network_id_idx on t_interface service_s3_ (cost=0.00..8.27
rows=1 width=558)"
" Index Cond:
(service_s3_.network_id = service_s4_.id)"
" -> Seq Scan on t_service service1_
(cost=0.00..109.28 rows=5828 width=40)"
" -> Index Scan using event_svc_id_idx on t_event
this_ (cost=0.00..23681.12 rows=8188 width=345)"
" Index Cond: (this_.service_id =
service1_.id)"
" -> Index Scan using t_system_pkey on t_system
service_s2_ (cost=0.00..6.27 rows=1 width=16)"
" Index Cond: (service_s2_.id = service1_.system_id)"
[/code]

This one takes HOURS, but I'm joining one table LESS :

[code]
select * from t_Event event
inner join t_Service service on event.service_id=service.id
inner join t_System system on service.system_id=system.id
inner join t_Interface interface on system.id=interface.system_id
where (interface.network_id=1) order by event.c_date desc limit 25

"Limit (cost=147.79..2123.66 rows=10 width=959)"
" -> Nested Loop (cost=147.79..2601774.46 rows=13167 width=959)"
" Join Filter: (service1_.id = this_.service_id)"
" -> Index Scan Backward using event_date_idx on t_event
this_ (cost=0.00..887080.22 rows=8466896 width=345)"
" -> Materialize (cost=147.79..147.88 rows=9 width=614)"
" -> Hash Join (cost=16.56..147.79 rows=9 width=614)"
" Hash Cond: (service1_.system_id = service_s2_.id)"
" -> Seq Scan on t_service service1_
(cost=0.00..109.28 rows=5828 width=40)"
" -> Hash (cost=16.55..16.55 rows=1 width=574)"
" -> Nested Loop (cost=0.00..16.55 rows=1
width=574)"
" -> Index Scan using
interface_network_id_idx on t_interface service_s3_ (cost=0.00..8.27
rows=1 width=558)"
" Index Cond: (network_id = 1)"
" -> Index Scan using t_system_pkey on
t_system service_s2_ (cost=0.00..8.27 rows=1 width=16)"
" Index Cond: (service_s2_.id =
service_s3_.system_id)"
[/code]

My understanding is that in the first case the sort is done after all
the table joins and filtering, but in the second case ALL the rows in
t_event are scanned and sorted before the join. There is an index on
the sorting column. If I remove this index, the query runs very fast.
But I still need this index for other queries.So I must force the
planner to do the sort after the join, in the second case. How can i
do that?

Thanks a lot for your help,

Antoine

Responses

Browse pgsql-performance by date

  From Date Subject
Next Message Frank van Vugt 2008-05-06 15:27:40 Re: plan difference between set-returning function with ROWS within IN() and a plain join
Previous Message Tom Lane 2008-05-06 14:22:07 Re: Seqscan problem