Sorting when LEFT JOINING to 2 same tables, even aliased.

From: "Octavio Alvarez" <alvarezp(at)alvarezp(dot)ods(dot)org>
To: pgsql-performance(at)postgresql(dot)org
Subject: Sorting when LEFT JOINING to 2 same tables, even aliased.
Date: 2004-03-12 05:41:26
Message-ID: 2400.192.168.0.64.1079070086.squirrel@alvarezp.ods.org
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-performance


Hello to everybody.

I ask your help for a severe problem when doing a query that LEFT JOINs
one table to another ON a field, and then LEFT JOINs again to another
"instance" of a table ON another field which stores the same entity, but
with different meaning.

I include 3 EXPLAIN ANALYZEs:
* The first one, the target (and problematic) query, which runs in 5 to 6
minutes.
* The second one, a variation with the second LEFT JOIN commented out,
which runs in 175 to 450 ms.
* The third one, a variation of the first one with ORDER BY removed, which
gives me about 19 seconds.

Therefore, I feel like there are two problems here the one that raises the
clock to 6 minutes and one that raises it to 20 seconds. I expected a much
lower time. I checked indexes and data types already, they are all fine.
All relevant fields have BTREEs, all PKs have UNIQUE BTREE, and all id and
ext_* fields have 'integer' as data type. Each ext_* has its corresponding
REFERENCES contraint.

I translated all the table and field names to make it easier to read. I
made my best not to let any typo go through.

I'd appreciate any help.

Octavio.

=== First EXPLAIN ANALYZE ===

EXPLAIN ANALYZE
SELECT
t_materias_en_tira.id AS Id,
t_clientes.paterno || ' ' || t_clientes.materno || ' ' ||
t_clientes.nombre AS Alumno,
t_materias.nombre AS Materia,
t_materias__equivalentes.nombre AS MateriaEquivalente,
t_grupos.nombre AS Grupo,
calificacion_final AS Calificacion,
tipo AS Tipo,
eer AS EER,
total_asistencias AS TotalAsistencias,
total_clases As TotalClases
FROM
t_materias_en_tira
LEFT JOIN t_alumnos_en_semestre ON ext_alumno_en_semestre =
t_alumnos_en_semestre.id
LEFT JOIN t_alumnos ON ext_alumno = t_alumnos.id
LEFT JOIN t_clientes ON ext_cliente = t_clientes.id
LEFT JOIN t_materias ON ext_materia = t_materias.id
LEFT JOIN t_materias AS t_materias__equivalentes ON
ext_materia__equivalencia = t_materias.id
LEFT JOIN t_grupos ON ext_grupo = t_grupos.id
WHERE
t_alumnos_en_semestre.ext_ciclo = 2222
ORDER BY
Alumno, Materia;

This one gave:
QUERY
PLAN
---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
Sort (cost=11549.08..11552.11 rows=1210 width=112) (actual
time=311246.000..355615.000 rows=1309321 loops=1)
Sort Key: (((((t_clientes.paterno)::text || ' '::text) ||
(t_clientes.materno)::text) || ' '::text) ||
(t_clientes.nombre)::text), t_materias.nombre
InitPlan
-> Result (cost=0.00..0.01 rows=1 width=0) (actual
time=2.000..2.000 rows=1 loops=1)
-> Hash Left Join (cost=1089.25..11487.11 rows=1210 width=112)
(actual time=83.000..19303.000 rows=1309321 loops=1)
Hash Cond: ("outer".ext_grupo = "inner".id)
-> Nested Loop Left Join (cost=1086.92..11454.53 rows=1210
width=107) (actual time=82.000..9077.000 rows=1309321 loops=1)
Join Filter: ("outer".ext_materia__equivalencia =
"outer".id) -> Hash Left Join (cost=1078.15..1181.93
rows=1210
width=93) (actual time=82.000..275.000 rows=3473 loops=1)
Hash Cond: ("outer".ext_materia = "inner".id) ->
Merge Right Join (cost=1068.43..1154.07
rows=1210 width=71) (actual time=81.000..213.000
rows=3473 loops=1)
Merge Cond: ("outer".id = "inner".ext_cliente)
-> Index Scan using t_clientes_pkey on
t_clientes (cost=0.00..62.87 rows=1847
width=38) (actual time=10.000..34.000 rows=1847
loops=1)
-> Sort (cost=1068.43..1071.46 rows=1210
width=41) (actual time=71.000..76.000 rows=3473
loops=1)
Sort Key: t_alumnos.ext_cliente
-> Hash Left Join (cost=41.12..1006.48
rows=1210 width=41) (actual
time=9.000..61.000 rows=3473 loops=1)
Hash Cond: ("outer".ext_alumno =
"inner".id)
-> Nested Loop (cost=0.00..944.18
rows=1210 width=41) (actual
time=3.000..36.000 rows=3473
loops=1)
-> Index Scan using
i_t_alumnos_en_semestre__ext_ciclo
on t_alumnos_en_semestre
(cost=0.00..8.63 rows=269
width=8) (actual
time=2.000..3.000 rows=457
loops=1)
Index Cond: (ext_ciclo
= $0)
-> Index Scan using
i_t_materias_en_tira__ext_alumno_en_semestre
on t_materias_en_tira
(cost=0.00..3.32 rows=12
width=41) (actual
time=0.009..0.035 rows=8
loops=457)
Index Cond:
(t_materias_en_tira.ext_alumno_en_semestre
= "outer".id)
-> Hash (cost=36.50..36.50
rows=1850 width=8) (actual
time=6.000..6.000 rows=0loops=1)
-> Seq Scan on t_alumnos
(cost=0.00..36.50 rows=1850
width=8) (actual
time=1.000..3.000 rows=1850
loops=1)
-> Hash (cost=8.77..8.77 rows=377 width=26) (actual
time=1.000..1.000 rows=0 loops=1)
-> Seq Scan on t_materias (cost=0.00..8.77
rows=377 width=26) (actual time=0.000..1.000
rows=377 loops=1)
-> Materialize (cost=8.77..12.54 rows=377 width=22)
(actual time=0.000..0.175 rows=377 loops=3473)
-> Seq Scan on t_materias t_materias__equivalentes
(cost=0.00..8.77 rows=377 width=22) (actual
time=0.000..1.000 rows=377 loops=1)
-> Hash (cost=2.07..2.07 rows=107 width=13) (actual
time=1.000..1.000 rows=0 loops=1)
-> Seq Scan on t_grupos (cost=0.00..2.07 rows=107
width=13) (actual time=0.000..1.000 rows=107 loops=1)

Total runtime: 356144.000 ms

=== Second EXPLAIN ANALYZE ===

EXPLAIN ANALYZE
SELECT
t_materias_en_tira.id AS Id,
t_clientes.paterno || ' ' || t_clientes.materno || ' ' ||
t_clientes.nombre AS Alumno,
t_materias.nombre AS Materia,
-- t_materias__equivalentes.nombre AS MateriaEquivalente,
t_grupos.nombre AS Grupo,
calificacion_final AS Calificacion,
tipo AS Tipo,
eer AS EER,
total_asistencias AS TotalAsistencias,
total_clases As TotalClases
FROM
t_materias_en_tira
LEFT JOIN t_alumnos_en_semestre ON ext_alumno_en_semestre =
t_alumnos_en_semestre.id
LEFT JOIN t_alumnos ON ext_alumno = t_alumnos.id
LEFT JOIN t_clientes ON ext_cliente = t_clientes.id
LEFT JOIN t_materias ON ext_materia = t_materias.id
-- LEFT JOIN t_materias AS t_materias__equivalentes ON
ext_materia__equivalencia = t_materias.id
LEFT JOIN t_grupos ON ext_grupo = t_grupos.id
WHERE
t_alumnos_en_semestre.ext_ciclo = 2222
ORDER BY
Alumno, Materia;

EXPLAIN ANALYZE
SELECT
t_materias_en_tira.id AS Id,
t_clientes.paterno || ' ' || t_clientes.materno || ' ' ||
t_clientes.nombre AS Alumno,
t_materias.nombre AS Materia,
t_materias__equivalentes.nombre AS MateriaEquivalente,
t_grupos.nombre AS Grupo,
calificacion_final AS Calificacion,
tipo AS Tipo,
eer AS EER,
total_asistencias AS TotalAsistencias,
total_clases As TotalClases
FROM
t_materias_en_tira
LEFT JOIN t_alumnos_en_semestre ON ext_alumno_en_semestre =
t_alumnos_en_semestre.id
LEFT JOIN t_alumnos ON ext_alumno = t_alumnos.id
LEFT JOIN t_clientes ON ext_cliente = t_clientes.id
LEFT JOIN t_materias ON ext_materia = t_materias.id
LEFT JOIN t_materias AS t_materias__equivalentes ON
ext_materia__equivalencia = t_materias.id
LEFT JOIN t_grupos ON ext_grupo = t_grupos.id
WHERE
t_alumnos_en_semestre.ext_ciclo = 2222;

It gave:

QUERY
PLAN
---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
Sort (cost=1276.49..1279.51 rows=1210 width=90) (actual
time=341.000..341.000 rows=3473 loops=1)
Sort Key: (((((t_clientes.paterno)::text || ' '::text) ||
(t_clientes.materno)::text) || ' '::text) ||
(t_clientes.nombre)::text), t_materias.nombre
InitPlan
-> Result (cost=0.00..0.01 rows=1 width=0) (actual
time=146.000..146.000 rows=1 loops=1)
-> Hash Left Join (cost=1080.48..1214.52 rows=1210 width=90) (actual
time=209.000..284.000 rows=3473 loops=1)
Hash Cond: ("outer".ext_grupo = "inner".id)
-> Hash Left Join (cost=1078.15..1181.93 rows=1210 width=85)
(actual time=208.000..250.000 rows=3473 loops=1)
Hash Cond: ("outer".ext_materia = "inner".id)
-> Merge Right Join (cost=1068.43..1154.07 rows=1210
width=67) (actual time=207.000..227.000 rows=3473loops=1)
Merge Cond: ("outer".id = "inner".ext_cliente) ->
Index Scan using t_clientes_pkey on t_clientes
(cost=0.00..62.87 rows=1847 width=38) (actual
time=0.000..5.000 rows=1847 loops=1)
-> Sort (cost=1068.43..1071.46 rows=1210 width=37)
(actual time=207.000..209.000 rows=3473 loops=1)
Sort Key: t_alumnos.ext_cliente
-> Hash Left Join (cost=41.12..1006.48
rows=1210 width=37) (actual
time=152.000..196.000 rows=3473 loops=1)
Hash Cond: ("outer".ext_alumno =
"inner".id) -> Nested Loop
(cost=0.00..944.18
rows=1210 width=37) (actual
time=146.000..177.000 rows=3473 loops=1)
-> Index Scan using
i_t_alumnos_en_semestre__ext_ciclo
on t_alumnos_en_semestre
(cost=0.00..8.63 rows=269 width=8)
(actual time=146.000..148.000
rows=457 loops=1)
Index Cond: (ext_ciclo = $0)
-> Index Scan using
i_t_materias_en_tira__ext_alumno_en_semestre
on t_materias_en_tira
(cost=0.00..3.32 rows=12 width=37)
(actual time=0.009..0.022 rows=8
loops=457)
Index Cond:
(t_materias_en_tira.ext_alumno_en_semestre
= "outer".id)
-> Hash (cost=36.50..36.50 rows=1850
width=8) (actual time=6.000..6.000 rows=0
loops=1)
-> Seq Scan on t_alumnos
(cost=0.00..36.50 rows=1850
width=8) (actual time=0.000..3.000
rows=1850 loops=1)
-> Hash (cost=8.77..8.77 rows=377 width=26) (actual
time=1.000..1.000 rows=0 loops=1)
-> Seq Scan on t_materias (cost=0.00..8.77 rows=377
width=26) (actual time=0.000..0.000 rows=377loops=1)
-> Hash (cost=2.07..2.07 rows=107 width=13) (actual
time=1.000..1.000 rows=0 loops=1)
-> Seq Scan on t_grupos (cost=0.00..2.07 rows=107
width=13) (actual time=0.000..0.000 rows=107 loops=1)

Total runtime: 346.000 ms

=== Third EXPLAIN ANALYZE ===

EXPLAIN ANALYZE
SELECT
t_materias_en_tira.id AS Id,
t_clientes.paterno || ' ' || t_clientes.materno || ' ' ||
t_clientes.nombre AS Alumno,
t_materias.nombre AS Materia,
t_materias__equivalentes.nombre AS MateriaEquivalente,
t_grupos.nombre AS Grupo,
calificacion_final AS Calificacion,
tipo AS Tipo,
eer AS EER,
total_asistencias AS TotalAsistencias,
total_clases As TotalClases
FROM
t_materias_en_tira
LEFT JOIN t_alumnos_en_semestre ON ext_alumno_en_semestre =
t_alumnos_en_semestre.id
LEFT JOIN t_alumnos ON ext_alumno = t_alumnos.id
LEFT JOIN t_clientes ON ext_cliente = t_clientes.id
LEFT JOIN t_materias ON ext_materia = t_materias.id
LEFT JOIN t_materias AS t_materias__equivalentes ON
ext_materia__equivalencia = t_materias.id
LEFT JOIN t_grupos ON ext_grupo = t_grupos.id
WHERE
t_alumnos_en_semestre.ext_ciclo = 2222;

Result:

QUERY
PLAN
---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
Hash Left Join (cost=484.34..4470.54 rows=459 width=112) (actual
time=70.000..18241.000 rows=1309321 loops=1)
Hash Cond: ("outer".ext_grupo = "inner".id)
-> Nested Loop Left Join (cost=482.01..4456.73 rows=459 width=107)
(actual time=70.000..7912.000 rows=1309321 loops=1)
Join Filter: ("outer".ext_materia__equivalencia = "outer".id) ->
Hash Left Join (cost=473.24..554.49 rows=459 width=93)
(actual time=70.000..142.000 rows=3473 loops=1)
Hash Cond: ("outer".ext_materia = "inner".id)
-> Merge Right Join (cost=463.52..537.90 rows=459
width=71) (actual time=67.000..109.000 rows=3473 loops=1)
Merge Cond: ("outer".id = "inner".ext_cliente) ->
Index Scan using t_clientes_pkey on t_clientes
(cost=0.00..62.87 rows=1847 width=38) (actual
time=0.000..14.000 rows=1847 loops=1)
-> Sort (cost=463.52..464.67 rows=459 width=41)
(actual time=67.000..69.000 rows=3473 loops=1)
Sort Key: t_alumnos.ext_cliente
-> Merge Right Join (cost=379.40..443.23
rows=459 width=41) (actual time=34.000..57.000
rows=3473 loops=1)
Merge Cond: ("outer".id =
"inner".ext_alumno)
-> Index Scan using t_alumnos_pkey on
t_alumnos (cost=0.00..52.35 rows=1850
width=8) (actual time=0.000..4.000
rows=1850 loops=1)
-> Sort (cost=379.40..380.55 rows=459
width=41) (actual time=34.000..36.000
rows=3473 loops=1)
Sort Key:
t_alumnos_en_semestre.ext_alumno
-> Nested Loop (cost=0.00..359.11
rows=459 width=41) (actual
time=0.000..21.000 rows=3473
loops=1)
-> Index Scan using
i_t_alumnos_en_semestre__ext_ciclo
on t_alumnos_en_semestre
(cost=0.00..4.36 rows=102
width=8) (actual
time=0.000..1.000 rows=457
loops=1)
Index Cond: (ext_ciclo
= 2222)
-> Index Scan using
i_t_materias_en_tira__ext_alumno_en_semestre
on t_materias_en_tira
(cost=0.00..3.32 rows=12
width=41) (actual
time=0.004..0.026 rows=8
loops=457)
Index Cond:
(t_materias_en_tira.ext_alumno_en_semestre
= "outer".id)
-> Hash (cost=8.77..8.77 rows=377 width=26) (actual
time=2.000..2.000 rows=0 loops=1)
-> Seq Scan on t_materias (cost=0.00..8.77 rows=377
width=26) (actual time=0.000..2.000 rows=377 loops=1)
-> Materialize (cost=8.77..12.54 rows=377 width=22) (actual
time=0.000..0.163 rows=377 loops=3473)
-> Seq Scan on t_materias t_materias__equivalentes
(cost=0.00..8.77 rows=377 width=22) (actual
time=0.000..1.000 rows=377 loops=1)
-> Hash (cost=2.07..2.07 rows=107 width=13) (actual time=0.000..0.000
rows=0 loops=1)
-> Seq Scan on t_grupos (cost=0.00..2.07 rows=107 width=13)
(actual time=0.000..0.000 rows=107 loops=1)

Total runtime: 18787.000 ms

SELECT count(*) FROM t_materias_en_tira;
count
-------
41059
(1 row)

SELECT count(*) FROM t_materias;
count
-------
377
(1 row)

SELECT version();;
version
---------------------------------------------------------------------------------------
PostgreSQL 7.4.1 on i686-pc-cygwin, compiled by GCC gcc (GCC) 3.3.1
(cygming special)
(1 row)

--
Octavio Alvarez.
E-mail: alvarezp(at)alvarezp(dot)ods(dot)org(dot)

Agradezco que sus correos sean enviados siempre a esta dirección.

--
Octavio Alvarez.
E-mail: alvarezp(at)alvarezp(dot)ods(dot)org(dot)

Agradezco que sus correos sean enviados siempre a esta dirección.

Responses

Browse pgsql-performance by date

  From Date Subject
Next Message Stephan Szabo 2004-03-12 06:09:16 Re: Sorting when LEFT JOINING to 2 same tables, even
Previous Message Tom Lane 2004-03-12 04:07:13 Re: severe performance issue with planner