Quick Links

RE: join super lento

From:	"Mario Soto Cordones" <marioa(dot)soto(dot)cordones(at)gmail(dot)com>
To:	"'Hellmuth Vargas'" <hivs77(at)gmail(dot)com>
Cc:	"'Lista Postgres ES'" <pgsql-es-ayuda(at)postgresql(dot)org>
Subject:	RE: join super lento
Date:	2016-02-22 15:55:43
Message-ID:	!&!AAAAAAAAAAAYAAAAAAAAAPma+WVXT5NGnV/8+cC15lfCgAAAEAAAABGqMrHirRRCutyju6SVsA0BAAAAAA==@gmail.com
Views:	Whole Thread \| Raw Message \| Download mbox \| Resend email
Thread:
Lists:	pgsql-es-ayuda

Que valores tienes en estos parámetros:

max_connections

shared_buffers

effective_cache_size

work_mem

Saludos

De: pgsql-es-ayuda-owner(at)postgresql(dot)org [mailto:pgsql-es-ayuda-owner(at)postgresql(dot)org] En nombre de Anthony Sotolongo
Enviado el: lunes, 22 de febrero de 2016 12:46
Para: Hellmuth Vargas <hivs77(at)gmail(dot)com>
CC: Lista Postgres ES <pgsql-es-ayuda(at)postgresql(dot)org>
Asunto: Re: [pgsql-es-ayuda] join super lento

Puedes hacer el explain analyze con ese valor nuevo de work_mem 2048 MB
a ver que hizo con el Sort Method, pues me da la impresión que el problema esta ahí en el Sort Method

Creo que no se trata de las cantidad de JOIN, hay que detectar donde es el cuello de botella en el plan.

saludos

On 22/02/16 12:38, Hellmuth Vargas wrote:

Hola Anthony

Gracias por la pregunta: actualmente se tiene configurado así la maquina:.

cat postgresql.conf | grep -i 'work_mem'

work_mem = 16MB # min 64kB

maintenance_work_mem = 512MB # min 1MB

Aunque también probé colocando hasta 2048 MB y no cambia en mucho la verdad

set local work_mem='2048 MB'

select t.descripcionmovimiento, t.fechamto, t.horamto, t.fechaact, t.horaact, t.usuario, t.identificacionusuario, t.tipomovimientosusuario, c.fecha, c.nombre, c.regionalath, c.regionaltdv, c.tiposolicitud, 1 as cant

from public.ath_tecnicosv2 t

left join public.ath_cajerosv2 c on t.identificacionusuario=c.identificacion and t.fechamto=c.fecha

where fecha >= '20160218' and fechamto >= '20160218'

group by t.descripcionmovimiento, t.fechamto, t.horamto, t.fechaact, t.horaact, t.usuario, t.identificacionusuario, t.tipomovimientosusuario, c.fecha, c.nombre, c.regionalath, c.regionaltdv, c.tiposolicitud,1

Group (cost=60388.03..65631.12 rows=749012 width=179)

-> Sort (cost=60388.03..60762.54 rows=749012 width=179)

Sort Key: t.descripcionmovimiento, t.fechamto, t.horamto, t.fechaact, t.horaact, t.usuario, t.identificacionusuario, t.tipomovimientosusuario, c.fecha, c.nombre, c.regionalath, c.regionaltdv, c.tiposolicitud

-> Merge Join (cost=7.77..6784.44 rows=749012 width=179)

Merge Cond: ((c.fecha = t.fechamto) AND (c.identificacion = t.identificacionusuario))

-> Index Only Scan using idx_ath_cajerosv2_comp on ath_cajerosv2 c (cost=0.08..169.88 rows=9072 width=64)

Index Cond: (fecha >= '2016-02-18'::date)

-> Materialize (cost=0.11..2533.11 rows=94366 width=123)

-> Index Only Scan using idx_ath_tecnicosv2_comp on ath_tecnicosv2 t (cost=0.11..2485.92 rows=94366 width=123)

Index Cond: (fechamto >= '2016-02-18'::date)

Y pues tengo otras tablas muchísimo mas grandes (35 Millones de registros) con consultas con varios join y no se tardan tanto ( en el mismo cluster de base de datos)

2016-02-22 10:13 GMT-05:00 Anthony Sotolongo <asotolongo(at)gmail(dot)com <mailto:asotolongo(at)gmail(dot)com> >:

Hola Hellmuth, que valor tiene tu work_mem a la hora de realizar esa consulta?, pues se observa varios:
Sort Method: external merge Disk: 247448kB

lo cual indica que esta realizando esa operación de ordenamiento en disco!!!

Saludos
Anthony

On 22/02/16 11:55, Hellmuth Vargas wrote:

Hola Lista

Les tengo el siguiente desafio pues no he podido dar con el tema, tengo dos tablas

CREATE TABLE ath_tecnicosv2

(

descripcionmovimiento character varying(160),

fechamto date,

horamto character varying(8),

fechaact character varying(8),

horaact character varying(8),

usuario character varying(50),

identificacionusuario bigint,

tipomovimientosusuario character varying(25),

id bigserial NOT NULL

)

WITH (

OIDS=FALSE

);

-- tamaño: 1639200 registros

CREATE UNIQUE INDEX idx_u_ath_tecnicosv2_id

ON ath_tecnicosv2

USING btree

(id);

CREATE INDEX idx_ath_tecnicosv2_comp

ON ath_tecnicosv2

USING btree

(fechamto, identificacionusuario, descripcionmovimiento COLLATE pg_catalog."default", horamto COLLATE pg_catalog."default", fechaact COLLATE pg_catalog."default", horaact COLLATE pg_catalog."default", usuario COLLATE pg_catalog."default", tipomovimientosusuario COLLATE pg_catalog."default");

CREATE INDEX idx_ath_tecnicosv2_fecha2

ON public.ath_tecnicosv2

USING btree

(fechamto asc, identificacionusuario asc,id);

CREATE TABLE ath_cajerosv2

(

fecha date,

nombre character varying(60),

regionalath character varying(24),

regionaltdv character varying(54),

tiposolicitud character varying(10),

id_usuario character varying(4),

identificacion bigint,

usuario character varying(36),

cant bigint,

id bigserial NOT NULL

)

WITH (

OIDS=FALSE

);

-- tamaño: 132050 registros

CREATE UNIQUE INDEX idx_u_ath_cajerosv2_id

ON ath_cajerosv2

USING btree

(id);

CREATE INDEX idx_ath_cajerosv2_comp

ON ath_cajerosv2

USING btree

(fecha, identificacion, nombre COLLATE pg_catalog."default", regionalath COLLATE pg_catalog."default", regionaltdv COLLATE pg_catalog."default", tiposolicitud COLLATE pg_catalog."default");

CREATE INDEX idx_ath_cajerosv2_fecha2

ON public.ath_cajerosv2

USING btree

(fecha asc, identificacion asc,id);

-- consulta Básica Inicial:

from public.ath_tecnicosv2 t

join public.ath_cajerosv2 c on t.identificacionusuario=c.identificacion and t.fechamto=c.fecha

where fecha >= '20160218' and fechamto >= '20160218'

-- plan de ejecucion

Group (cost=60388.03..65631.12 rows=749012 width=179)

-> Sort (cost=60388.03..60762.54 rows=749012 width=179)

-> Merge Join (cost=7.77..6784.44 rows=749012 width=179)

Merge Cond: ((c.fecha = t.fechamto) AND (c.identificacion = t.identificacionusuario))

-> Index Only Scan using idx_ath_cajerosv2_comp on ath_cajerosv2 c (cost=0.08..169.88 rows=9072 width=64)

Index Cond: (fecha >= '2016-02-18'::date)

-> Materialize (cost=0.11..2533.11 rows=94366 width=123)

-> Index Only Scan using idx_ath_tecnicosv2_comp on ath_tecnicosv2 t (cost=0.11..2485.92 rows=94366 width=123)

Index Cond: (fechamto >= '2016-02-18'::date)

Alcanzamos a esperar 45 minutos por el resultado de la consulta y no había terminado.

Entonces empezamos a revisar cada tabla: cada una por separado no se demora la consulta:

-------------

-- tabla ath_tecnicosv2

select t.descripcionmovimiento, t.fechamto, t.horamto, t.fechaact, t.horaact, t.usuario, t.identificacionusuario, t.tipomovimientosusuario

from public.ath_tecnicosv2 t

where t.fechamto between '20160218' and now()

order by t.identificacionusuario,t.fechamto

Sort (cost=7682.15..7729.33 rows=94366 width=123) (actual time=192.756..216.720 rows=96462 loops=1)

Sort Key: identificacionusuario, fechamto

Sort Method: external merge Disk: 11744kB

-> Index Only Scan using idx_ath_tecnicosv2_comp on ath_tecnicosv2 t (cost=0.11..2533.11 rows=94366 width=123) (actual time=1.382..60.523 rows=96462 loops=1)

Index Cond: ((fechamto >= '2016-02-18'::date) AND (fechamto <= now()))

Heap Fetches: 0

Total runtime: 223.940 ms

-----------------------

-- tabla ath_cajerosv2

select c.fecha, c.nombre, c.regionalath, c.regionaltdv, c.tiposolicitud,c.identificacion

from public.ath_cajerosv2 c

where fecha between '20160218' and now()

order by c.identificacion,c.fecha

plan 3

Sort (cost=293.69..298.22 rows=9072 width=64) (actual time=9.959..10.501 rows=9046 loops=1)

Sort Key: identificacion, fecha

Sort Method: quicksort Memory: 1641kB

-> Index Only Scan using idx_ath_cajerosv2_comp on ath_cajerosv2 c (cost=0.08..174.42 rows=9072 width=64) (actual time=1.174..6.215 rows=9046 loops=1)

Index Cond: ((fecha >= '2016-02-18'::date) AND (fecha <= now()))

Heap Fetches: 0

Total runtime: 10.928 ms

por lo tanto el problema es el JOIN , realice un JOIN con solo los ID (no se demora mucho):

----------------------

select t.id <http://t.id> as tecnicos_id,c.id <http://c.id> as cajeros_id

from public.ath_tecnicosv2 t

join public.ath_cajerosv2 c on t.identificacionusuario=c.identificacion and t.fechamto=c.fecha

where fecha >= '20160218' and fechamto >= '20160218'

group by 1,2

--- plan 4

Group (cost=27378.71..28502.23 rows=749012 width=16) (actual time=14665.728..20729.334 rows=9736174 loops=1)

-> Sort (cost=27378.71..27753.21 rows=749012 width=16) (actual time=14665.725..18538.695 rows=9736174 loops=1)

Sort Key: t.id <http://t.id> , c.id <http://c.id>

Sort Method: external merge Disk: 247448kB

-> Merge Join (cost=4.34..5263.12 rows=749012 width=16) (actual time=2.168..2993.563 rows=9736174 loops=1)

Merge Cond: ((c.fecha = t.fechamto) AND (c.identificacion = t.identificacionusuario))

-> Index Only Scan using idx_ath_cajerosv2_fecha2 on ath_cajerosv2 c (cost=0.08..100.58 rows=9072 width=20) (actual time=1.027..12.524 rows=9046 loops=1)

Index Cond: (fecha >= '2016-02-18'::date)

Heap Fetches: 0

-> Materialize (cost=0.09..1081.08 rows=94366 width=20) (actual time=1.131..1006.430 rows=9738392 loops=1)

-> Index Only Scan using idx_ath_tecnicosv2_fecha2 on ath_tecnicosv2 t (cost=0.09..1033.90 rows=94366 width=20) (actual time=1.128..20.970 rows=96459 loops=1)

Index Cond: (fechamto >= '2016-02-18'::date)

Heap Fetches: 0

Total runtime: 21264.042 ms

-----

Por lo tanto, reescribí la consulta, primero haciendo el JOIN entre ID y luego vinculando las tablas con estos mismos IDS pero no funciono, se sigue demorando mucho... si hago JOIN por las tablas por separado no se demoran:

select t.descripcionmovimiento, t.fechamto, t.horamto, t.fechaact, t.horaact, t.usuario, t.identificacionusuario, t.tipomovimientosusuario

from (

select t.id <http://t.id> as tecnicos_id,c.id <http://c.id> as cajeros_id

from public.ath_tecnicosv2 t

join public.ath_cajerosv2 c on t.identificacionusuario=c.identificacion and t.fechamto=c.fecha

where fecha >= '20160218' and fechamto >= '20160218'

group by 1,2

) as x

join public.ath_tecnicosv2 t on t.id <http://t.id> =x.tecnicos_id

plan 5

Merge Join (cost=27378.79..81818.32 rows=749012 width=123) (actual time=18134.833..28281.834 rows=9736174 loops=1)

Merge Cond: (t_1.id <http://t_1.id> = t.id <http://t.id> )

-> Group (cost=27378.71..28502.23 rows=749012 width=16) (actual time=17462.700..23149.037 rows=9736174 loops=1)

-> Sort (cost=27378.71..27753.21 rows=749012 width=16) (actual time=17462.698..21212.132 rows=9736174 loops=1)

Sort Key: t_1.id <http://t_1.id> , c.id <http://c.id>

Sort Method: external merge Disk: 247448kB

-> Merge Join (cost=4.34..5263.12 rows=749012 width=16) (actual time=0.097..3588.954 rows=9736174 loops=1)

Merge Cond: ((c.fecha = t_1.fechamto) AND ( c.identificacion = t_1.identificacionusuario))

-> Index Only Scan using idx_ath_cajerosv2_fecha2 on ath_cajerosv2 c (cost=0.08..100.58 rows=9072 width=20) (actual time=0.057..14.490 rows=9046 loops=1)

Index Cond: (fecha >= '2016-02-18'::date)

Heap Fetches: 0

-> Materialize (cost=0.09..1081.08 rows=94366 width=20) (actual time=0.038..1253.519 rows=9738392 loops=1)

-> Index Only Scan using idx_ath_tecnicosv2_fecha2 on ath_tecnicosv2 t_1 (cost=0.09..1033.90 rows=94366 width=20) (actual time=0.035..25.707 rows=96459 loops=1)

Index Cond: (fechamto >= '2016-02-18'::date)

Heap Fetches: 0

-> Index Scan using idx _u_ath_tecnicosv2_id on ath_tecnicosv2 t (cost=0.09..46129.90 rows=1639202 width=131) (actual time=1.746..1950.852 rows=11281027 loops=1)

Total runtime: 28768.255 ms

-----------------

select

c.fecha, c.nombre, c.regionalath, c.regionaltdv, c.tiposolicitud

from (

select t.id <http://t.id> as tecnicos_id,c.id <http://c.id> as cajeros_id

from public.ath_tecnicosv2 t

join public.ath_cajerosv2 c on t.identificacionusuario=c.identificacion and t.fechamto=c.fecha

where fecha >= '20160218' and fechamto >= '20160218'

group by 1,2

) as x

join public.ath_cajerosv2 c on c.id <http://c.id> =x.cajeros_id

plan 6

Hash Join (cost=32278.13..47788.27 rows=749012 width=56) (actual time=17682.541..29054.853 rows=9736174 loops=1)

Hash Cond: (x.cajeros_id = c.id <http://c.id> )

-> Subquery Scan on x (cost=27378.71..31498.27 rows=749012 width=8) (actual time=17588.940..24201.101 rows=9736174 loops=1)

-> Group (cost=27378.71..28502.23 rows=749012 width=16) (actual time=17588.939..23095.463 rows=9736174 loops=1)

-> Sort (cost=27378.71..27753.21 rows=749012 width=16) (actual time=17588.936..21161.819 rows=9736174 loops=1)

Sort Key: t.id <http://t.id> , c_1.id <http://c_1.id>

Sort Method: external merge Disk: 247448kB

- > Merge Join (cost=4.34..5263.12 rows=749012 wid th=16) (actual time=0.076..3827.427 rows=9736174 loops=1)

Merge Cond: ((c_1.fecha = t.fechamto) AND (c_1.identificacion = t.identificacionusuario))

-> Index Only Scan using idx_ath_cajerosv2_fecha2 on ath_cajerosv2 c_1 (cost=0.08..100.58 rows=9072 width=20) (actual time=0.039..16.084 rows=9046 loops=1)

Index Cond: (fecha >= '2016-02-18'::date)

Heap Fetches: 0

-> Materialize (cost=0.09..1081.08 rows=94366 width=20) (actual time=0.031..1317.914 rows=9738392 loops=1)

-> Index Only Scan using idx_ath_tecnicosv2_fecha2 on ath_tecnicosv2 t (cost=0.09..1033.90 rows=94366 width=20) (actual time=0.029..23.961 rows=96459 loops=1)

Â Index Cond: (fechamto >= '2016-02-18'::date)

Heap Fetches: 0

-> Hash (cost=2886.20..2886.20 rows=132050 width=64) (actual time=92.976..92.976 rows=132050 loops=1)

Buckets: 8192 Batches: 4 Memory Usage: 3281kB

-> Seq Scan on ath_cajerosv2 c (cost=0.00..2886.20 rows=132050 width=64) (actual time=0.451..43.008 rows=132050 loops=1)

Total runtime: 29573.302 m

Entonces no se pueda ser!!!

incluso (a manera de comentario) replantee la consultas asi y tampoco:

with base1 as (

select t.descripcionmovimiento, t.fechamto, t.horamto, t.fechaact, t.horaact, t.usuario, t.identificacionusuario, t.tipomovimientosusuario

from (

select t.identificacionusuario,t.fechamto from public.ath_tecnicosv2 t where fechamto >= '20160218'

intersect

select c.identificacion,c.fecha from public.ath_cajerosv2 c where fecha >= '20160218'

) as x

join public.ath_tecnicosv2 t on (t.identificacionusuario,t.fechamto)=(x.identificacionusuario,x.fechamto)

), base2 as (

select x.identificacionusuario,x.fechamto,c.fecha, c.nombre, c.regionalath, c.regionaltdv, c.tiposolicitud

from (

select t.identificacionusuario,t.fechamto from public.ath_tecnicosv2 t where fechamto >= '20160218'

intersect

select c.identificacion,c.fecha from public.ath_cajerosv2 c where fecha >= '20160218'

) as x

join public.ath_cajerosv2 c on (c.identificacion,c.fecha)=(x.identificacionusuario,x.fechamto)

)

select u.descripcionmovimiento, u.fechamto, u.horamto, u.fechaact, u.horaact, u.usuario, u.identificacionusuario, u.tipomovimientosusuario, u.fecha, u.nombre, u.regionalath, u.regionaltdv, u.tiposolicitud, 1 as cant

from (

select

x.descripcionmovimiento, x.fechamto, x.horamto, x.fechaact, x.horaact, x.usuario, x.identificacionusuario, x.tipomovimientosusuario, y.fecha, y.n ombre, y.regionalath, y.regionaltdv, y.tiposolicitud,y.identificacionusuario as identificacionusuario2,y.fechamto as fechamto2

from base1 as x

cross join base2 as y

) as u where u.identificacionusuario=u.identificacionusuario2 and u.fechamto=u.fechamto2

order by 2,3,7

---

Sort (cost=8260.00..8260.48 rows=961 width=1010)

Sort Key: x.fechamto, x.horamto, x.identificacionusuario

CTE base1

-> Nested Loop (cost=0.19..3929.54 rows=21842 width=123)

-> Subquery Scan on x_1 (cost=0.08..1528.40 rows=533 width=12)

-> HashSetOp Intersect (cost=0.08..1526.27 rows=533 width=12)

-> Append (cost=0.08..1422.83 rows=103438 width=12)

-> Subquery Scan on "*SELECT* 2" (cost=0.08..125.87 rows=9072 width=12)

-> Index Only Scan using idx_ath_cajerosv2_fecha on ath_cajerosv2 c (cost=0.08..89.58 rows=9072 width=12)

Index Cond: (fecha >= '2016-02-18'::date)

-> Subquery Scan on "*SELECT* 1" (cost=0.09..1296.96 rows=94366 width=12)

-> Index Only Scan using idx_ath_tecnicosv2_fecha on ath_tecnicosv2 t_1 (cost=0.09..919.50 rows=94366 width=12)

Index Cond: (fechamto >= '2016-02-18'::date)

-> Index Only Scan using idx_ath_tecnicosv2_comp on ath_tecnicosv2 t (cost=0.11..4.08 rows=107 width=123)

Index Cond: ((fechamto = x_1.fechamto) AND (identificacionusuario = x_1.identificacionusuario))

CTE base2

-> Nested Loop (cost=0.17..2175.54 rows=1760 width=68)

-> Subquery Scan on x_2 (cost=0.08..1528.40 rows=533 width=12)

-> HashSetOp Intersect (cost=0.08..1526.27 rows=533 width=12)

-> Append (cost=0.08..1422.83 rows=103438 width=12)

-> Subquery Scan on "*SELECT* 2_1" (cost=0.08..125.87 rows=9072 width=12)

-> Index Only Scan using idx_ath_cajerosv2_fecha on ath_cajerosv2 c_2 (cost=0.08..89.58 rows=9072 width=12)

Index Cond: (fecha >= '2016-02-18'::date)

-> Subquery Scan on "*SELECT* 1_1" (cost=0.09..1296.96 rows=94366 width=12)

-> Index Only Scan using idx_ath_tecnicosv2_fecha on ath_tecnicosv2 t_2 (cost=0.09..919.50 rows=94366 width=12)

Index Cond: (fechamto >= '2016-02-18'::date)

-> Index Only Scan using idx_ath_cajerosv2_comp on ath_cajerosv2 c_1 (cost=0.08..1.15 rows=17 width=64)

Index Cond: ((fecha = x_2.fechamto) AND (identificacion = x_2.identificacionusuario))

-> Hash Join (cost=22.88..2145.40 rows=961 width=1010)

Hash Cond: ((x.identificacionusuario = y.identificacionusuario) AND (x.fechamto = y.fechamto))

-> CTE Scan on base1 x (cost=0.00..174.74 rows=21842 width=638)

-> Hash (cost=14.08..14.08 rows=1760 width=384)

-> CTE Scan on base2 y (cost=0.00..14.08 rows=1760 width=384)

He cambiado la estructura de las tablas: antes todos los campos eran del tipo TEXT y por lo tanto la tablas tenían ademas tablas TOAST y supuse que el JOIN con estas era lo que estaba penalizando.

Las especificaciones, es un PostgreSQL 9.4.5, corriendo en un CentOS 6 con discos de estado solido. 12 GB de RAM, 12

Lista les agradezco sus comentratios el ideas de que puede ser el probelma.

Cordialmente, Ing. Hellmuth I. Vargas S. Esp. Telemática y Negocios por Internet

Oracle Database 10g Administrator Certified Associate

EnterpriseDB Certified PostgreSQL 9.3 Associate

Cordialmente, Ing. Hellmuth I. Vargas S. Esp. Telemática y Negocios por Internet

Oracle Database 10g Administrator Certified Associate

EnterpriseDB Certified PostgreSQL 9.3 Associate

In response to

Re: join super lento at 2016-02-22 15:45:43 from Anthony Sotolongo

Responses

Re: join super lento at 2016-02-22 16:06:35 from Hellmuth Vargas

Browse pgsql-es-ayuda by date

	From	Date	Subject
Next Message	Hellmuth Vargas	2016-02-22 16:06:35	Re: join super lento
Previous Message	Gilberto Castillo	2016-02-22 15:53:27	Re: [MASSMAIL]Re: join super lento