Re: Same query 10000x More Time

From: Kyotaro Horiguchi <horikyota(dot)ntt(at)gmail(dot)com>
To: vijaykumarjain(dot)github(at)gmail(dot)com
Cc: AviW(at)gilat(dot)com, pgsql-performance(at)postgresql(dot)org
Subject: Re: Same query 10000x More Time
Date: 2022-01-06 09:39:08
Message-ID: 20220106.183908.758244498024205160.horikyota.ntt@gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-performance

At Thu, 6 Jan 2022 13:50:55 +0530, Vijaykumar Jain <vijaykumarjain(dot)github(at)gmail(dot)com> wrote in
> On Thu, 6 Jan 2022 at 13:13, Avi Weinberg <AviW(at)gilat(dot)com> wrote:
>
> > Hi
> >
> >
> >
> > I have postgres_fdw table called tbl_link. The source table is 2.5 GB in
> > size with 122 lines (some lines has 70MB bytea column, but not the ones I
> > select in the example)
> >
> > I noticed that when I put the specific ids in the list "where id in
> > (140,144,148)" it works fast (few ms), but when I put the same list as
> > select "where id in (select 140 as id union select 144 union select 148)"
> > it takes 50 seconds. This select union is just for the example, I
> > obviously have a different select (which by itself takes few ms but cause
> > the whole insert query to take 10000x more time)
> >
> >
> >
> > Why is that? How can I still use regular select and still get reasonable
> > response time?
> >
> >
> >
> > Thanks
> >
> >
> >
>
> couple of things:
> PostgreSQL: Documentation: 14: F.35. postgres_fdw
> <https://www.postgresql.org/docs/current/postgres-fdw.html>
> <https://www.postgresql.org/docs/current/postgres-fdw.html>when you set
> your foreign server what are your
> use_remote_estimate
> fetch_size
> params for the foreign server.
>
> you need to know there are certain restrictions on what gets pushed down to
> the remote server
> i generally use postgres/postgres_fdw.sql at master · postgres/postgres
> (github.com)
> <https://github.com/postgres/postgres/blob/master/contrib/postgres_fdw/sql/postgres_fdw.sql>
> as
> a reference
> if you predicates are not pushed down, it will bring all the rows from the
> foreign server to your local server (and fetch_size value and network io
> will add to delay)
> and given you used select * , it will be a lot of io, so maybe restrict
> only to columns needed after being filtered would help.
>
>
> you can try by running
> explain (verbose,analyze) query and then also enabling log_statement =
> 'all' / log_min_duration_statement = 0
> on the foreign server to see the actual plan for the foreign scan.
>
> That might help in trouble shooting.
>
>
> as always, i have little production exposure. If i am wrong, i can be
> corrected.

In this specific case, the FAST query doesn't contain a join and its
predicate can be pushed down to remote. On the other hand the SLOW
one contains a join. The planner considers remote join only when the
both hands of a join are on the same foreign server. Tthis is not the
case since the inner subquery is not even a foreign scan. The planner
doesn't consider the possibility that a subquery is executable
anywhere.

As the result, the local inevitably draw all rows from remote table to
join with the result of the subquery on-local, which should be quite
slow.

It could be improved, but I don't think we are going to consider that
case because the SLOW query seems like a kind of bad query, which can
be improved by rewriting to the FAST one.

regards.

--
Kyotaro Horiguchi
NTT Open Source Software Center

In response to

Responses

Browse pgsql-performance by date

  From Date Subject
Next Message Avi Weinberg 2022-01-06 10:20:49 RE: Same query 10000x More Time
Previous Message Vijaykumar Jain 2022-01-06 08:20:55 Re: Same query 10000x More Time