Query much slower from php than psql or dbeaver

From: Ekaterina Amez <ekaterina(dot)amez(at)zunibal(dot)com>
To: pgsql-general(at)lists(dot)postgresql(dot)org
Subject: Query much slower from php than psql or dbeaver
Date: 2022-01-20 14:31:06
Message-ID: CAFijohhPt+KAb+HPfWhkYvhNWhFGE3iCD_sn_88AsfUFyZ8ECQ@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-general

Hi,

After receiving an Unknown Address error with *pgsql-php(at)postgresql(dot)org* I've
discovered this mailing list is catalogued as Inactive, so I'm sending my
question to this list.

I've made a php cli script that downloads a file from FTP, loads it in a
table and compares against the same table in 2 different databases that are
connected to the main one with pgsql foreign data wrapper (so there are 3
DB involved via linked servers).

I've tested the query with psql and DBeaver and it takes only milliseconds:
it returns 39 records and now there's only 16000 records on the table but
I've tested it with <100K. When I've tested my php script the same query
takes 14 minutes to return (more or less). I've checked there are no
blocking processes while runnig this query, I've analyzed the table,
dropped and recreated, added indexes... and every time I run my script it
takes those 14 minutes.

The script opens the connection and after that makes all of the operations:
some checks, load the file with COPY sentence, move data with some
additional columns to final table, and after this and without dropping the
connection it runs this query. After this long query, script iterates over
the returned records to make some other queries for every record. And
finally there are 2 more queries, that makes the reverse search: the slow
query checks diferences or non-existing records from file data in the 2
linked databases, and the las 2 queries check non-existing records from
their tables in the file. Everything goes smooth except this long-lasting
query.

This is the query:
select t1.aaaa as maindb_aaaa, t1.bbb as maindb_bbb, t1.ccccc as
maindb_ccccc, t1.timestamp_create as maindb_create,
t1.timestamp_closed as maindb_close, t1.ddddddddd as maindb_ddddddddd,
null::text as db1_sth,
t2.eeeeeeee as db1_eeeeeeee, t2.ffffffff as db1_ffffffff, null::text as
db2_sth,
t3.eeeeeeee as db2_eeeeeeee, t3.ffffffff as db2_ffffffff
from table1 t1
left join database1_fdw.table2 t2 on t1.aaaa = t2.btatpd_aaaa
and t1.file = 'file_name.csv'
and t2.btatpd_fecha = '20220119120000'
and substring(t1.bbb from 1 for 3) in (<some_values>)
left join database2_fdw.table2 t3 on t1.aaaa = t3.btatpd_aaaa
and t1.fichero_origen = 'file_name.csv'
and t3.btatpd_fecha = '20220119120000'
and substring(t1.bbb from 1 for 3) in (<some_different_values)
where t1.ccccc = 'ACTIVE'
and t1.fichero_origen = 'file_name.csv'
and ((t2.eeeeeeee is null and t3.eeeeeeee is null)
or
(t2.eeeeeeee is not null and t1.ddddddddd <> t2.ffffffff)
or
(t3.eeeeeeee is not null and t1.ddddddddd <> t3.ffffffff)
)
order by t1.bbb nulls last;

I'm using PHP 5.3.3, with PG 9.6 over CentOS 6.8. I'm connecting to PG with
pg_connect and pg_query because here nobody is using PDO. And I have to
warn you that I'm not a PHP expert. Also I'm aware how old these versions
are, I'm trying to finish things to start with server migration.

This is the explain plan, in case it gives more info:
Sort (cost=5345.39..5385.35 rows=15986 width=212) (actual
time=112.315..112.318 rows=39 loops=1)
Output: t1.aaaa, t1.bbb, t1.ccccc, t1.timestamp_create,
t1.timestamp_closed, t1.ddddddddd, NULL::text, t2.eeeeeeee, t2.ffffffff,
NULL::text, t3.eeeeeeee, t3.ffffffff
Sort Key: t1.bbb
Sort Method: quicksort Memory: 29kB
Buffers: shared hit=1255
-> Hash Left Join (cost=237.54..2587.70 rows=15986 width=212) (actual
time=81.272..112.248 rows=39 loops=1)
Output: t1.aaaa, t1.bbb, t1.ccccc, t1.timestamp_create,
t1.timestamp_closed, t1.ddddddddd, NULL::text, t2.eeeeeeee, t2.ffffffff,
NULL::text, t3.eeeeeeee, t3.ffffffff
Hash Cond: ((t1.aaaa)::text = (t3.btatpd_aaaa)::text)
Join Filter: (((t1.file)::text = 'file_name.csv'::text) AND
("substring"((t1.bbb)::text, 1, 3) = ANY ('{<some_values>}'::text[])))
Rows Removed by Join Filter: 1
Filter: (((t2.eeeeeeee IS NULL) AND (t3.eeeeeeee IS NULL)) OR
((t2.eeeeeeee IS NOT NULL) AND (t1.ddddddddd <> t2.ffffffff)) OR
((t3.eeeeeeee IS NOT NULL) AND (t1.ddddddddd <> t3.ffffffff)))
Rows Removed by Filter: 15794
Buffers: shared hit=1252
-> Hash Left Join (cost=118.77..2408.88 rows=15988 width=149)
(actual time=71.890..101.261 rows=15820 loops=1)
Output: t1.aaaa, t1.bbb, t1.ccccc, t1.timestamp_create,
t1.timestamp_closed, t1.ddddddddd, t1.file, t2.eeeeeeee, t2.ffffffff
Hash Cond: ((t1.aaaa)::text = (t2.btatpd_aaaa)::text)
Join Filter: (((t1.file)::text = 'file_name.csv'::text) AND
("substring"((t1.bbb)::text, 1, 3) = ANY
('{<some_different_values>}'::text[])))
Buffers: shared hit=1252
-> Seq Scan on public.table1 t1 (cost=0.00..2230.06
rows=15988 width=103) (actual time=0.176..19.882 rows=15817 loops=1)
Output: t1.id, t1.aaaa, t1.bbb, t1.ccccc,
t1.timestamp_create, t1.timestamp_closed, t1.sbd_bundle_id, t1.ddddddddd,
t1.file, t1.fecha_carga
Filter: (((t1.file)::text = 'file_name.csv'::text) AND
((t1.ccccc)::text = 'ACTIVE'::text))
Rows Removed by Filter: 49387
Buffers: shared hit=1252
-> Hash (cost=118.73..118.73 rows=3 width=94) (actual
time=71.699..71.699 rows=14244 loops=1)
Output: t2.eeeeeeee, t2.ffffffff, t2.btatpd_aaaa
Buckets: 16384 (originally 1024) Batches: 1
(originally 1) Memory Usage: 1074kB
-> Foreign Scan on database1_fdw.table2 t2
(cost=100.00..118.73 rows=3 width=94) (actual time=3.961..67.271
rows=14244 loops=1)
Output: t2.eeeeeeee, t2.ffffffff, t2.btatpd_aaaa
Remote SQL: SELECT eeeeeeee, btatpd_aaaa,
ffffffff FROM public.table2 WHERE ((btatpd_fecha = '20220119120000'::text))
-> Hash (cost=118.73..118.73 rows=3 width=94) (actual
time=6.512..6.513 rows=3051 loops=1)
Output: t3.eeeeeeee, t3.ffffffff, t3.btatpd_aaaa
Buckets: 4096 (originally 1024) Batches: 1 (originally 1)
Memory Usage: 235kB
-> Foreign Scan on database2_fdw.table2 t3
(cost=100.00..118.73 rows=3 width=94) (actual time=1.494..5.857 rows=3051
loops=1)
Output: t3.eeeeeeee, t3.ffffffff, t3.btatpd_aaaa
Remote SQL: SELECT eeeeeeee, btatpd_aaaa, ffffffff FROM
public.table2 WHERE ((btatpd_fecha = '20220119120000'::text))
Planning time: 2.137 ms
Execution time: 123.077 ms

Anyone can tell me why is this happening and if is there a solution to this?

Thank you for your time.

--

*Ekaterina Amez González*

ekaterina(dot)amez(at)zunibal(dot)com

*ZUNIBAL* | Idorsolo 1, 48160-Derio, Spain
<https://maps.google.com/?q=Idorsolo+1,+48160-Derio,+Spain&entry=gmail&source=g>

Tel: +34 944 977 010 <+34%20944%2097%2070%2010> | Fax: +34 944 522 81|
www.zunibal.com

*Advertencia Legal. El contenido de este mensaje y de toda la documentación
anexa es confidencial y va dirigido únicamente al destinatario del mismo.
Si usted no fuera el destinatario le solicitamos que nos informe y no
comunique su contenido a terceros, procediendo a su destrucción. / Legal
advice. This message contains confidential information for the exclusive
use of the recipient. Any unauthorised disclosure, use or dissemination,
either whole or partial, is prohibited. If you are not the intended
recipient of the message, please notify the sender as soon as possible. En
cumplimiento de la Ley Orgánica 15/1999 de Protección de Datos de Carácter
Personal, le informamos que los Datos Personales de usted que están en
nuestra Base de Datos recabados con su consentimiento, forman parte de un
fichero automatizado registrado en la Agencia Española de Protección de
Datos. Estos datos sólo serán utilizados para realizar una correcta gestión
de nuestra relación comercial. Si lo desea podrá ejercitar en todo momento
los derechos de acceso, cancelación u oposición, remitiendo un correo
electrónico a esta dirección. / According to the Law 15/1999 of Personal
Data Protection, we inform your data are included in our data
base with your approval are registered in the spanish Agency of
Data Protection. **These data are only used for our usual business
relationship. If you want you are in the right of accessing or cancelling
them, just sending an e-mail to this address.*

*Antes de imprimir este mensaje, asegúrese de que es necesario hacerlo. /
Before printing this email, assess if it is really needed.*

Responses

Browse pgsql-general by date

  From Date Subject
Next Message Avi Weinberg 2022-01-20 14:42:13 RE: Multiple SELECT statements Using One WITH statement
Previous Message Tom Lane 2022-01-20 14:05:23 Re: Cannot find hstore operator