| From: | Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> | 
|---|---|
| To: | sean(dot)johnston(at)edgeintelligence(dot)com | 
| Cc: | pgsql-bugs(at)lists(dot)postgresql(dot)org, pgsql-hackers(at)lists(dot)postgresql(dot)org | 
| Subject: | Re: BUG #15781: subselect on foreign table (postgres_fdw) can crash (segfault) | 
| Date: | 2019-04-25 18:24:41 | 
| Message-ID: | 25734.1556216681@sss.pgh.pa.us | 
| Views: | Whole Thread | Raw Message | Download mbox | Resend email | 
| Thread: | |
| Lists: | pgsql-bugs pgsql-hackers | 
PG Bug reporting form <noreply(at)postgresql(dot)org> writes:
> [ this crashes if ft4 is a postgres_fdw foreign table: ]
> select exists(select c1 from ft4), avg(c1) from ft4 where c1 = (select
> max(c1) from ft4);
Hm, the max() subquery isn't necessary, this is sufficient:
select exists(select c1 from ft4), avg(c1) from ft4 where c1 = 42;
That produces a plan like
                                    QUERY PLAN                                     
-----------------------------------------------------------------------------------
 Foreign Scan  (cost=200.07..246.67 rows=1 width=33)
   Output: ($0), (avg(ft4.c1))
   Relations: Aggregate on (public.ft4)
   Remote SQL: SELECT $1::boolean, avg(c1) FROM "S 1"."T 3" WHERE ((c1 = 432))
   InitPlan 1 (returns $0)
     ->  Foreign Scan on public.ft4 ft4_1  (cost=100.00..212.39 rows=3413 width=0)
           Remote SQL: SELECT NULL FROM "S 1"."T 3"
(7 rows)
Now one's first observation about that is that it's kind of dumb to send
the result of the locally-executed InitPlan over to the far end only to
read it back.  So maybe we should be thinking about how to avoid that.
We do avoid it for plain foreign scans:
regression=# explain verbose 
 select exists(select c1 from ft4), * from ft4 where c1 = 42;
                                    QUERY PLAN                                     
-----------------------------------------------------------------------------------
 Foreign Scan on public.ft4  (cost=200.03..226.15 rows=6 width=41)
   Output: $0, ft4.c1, ft4.c2, ft4.c3
   Remote SQL: SELECT c1, c2, c3 FROM "S 1"."T 3" WHERE ((c1 = 42))
   InitPlan 1 (returns $0)
     ->  Foreign Scan on public.ft4 ft4_1  (cost=100.00..212.39 rows=3413 width=0)
           Remote SQL: SELECT NULL FROM "S 1"."T 3"
(6 rows)
and also for foreign joins:
regression=# explain verbose 
 select exists(select c1 from ft4), * from ft4, ft4 ft4b where ft4.c1 = 42 and ft4b.c1 = 43;
                                                                      QUERY PLAN                                                                      
------------------------------------------------------------------------------------------------------------------------------------------------------
 Foreign Scan  (cost=200.03..252.93 rows=36 width=81)
   Output: $0, ft4.c1, ft4.c2, ft4.c3, ft4b.c1, ft4b.c2, ft4b.c3
   Relations: (public.ft4) INNER JOIN (public.ft4 ft4b)
   Remote SQL: SELECT r1.c1, r1.c2, r1.c3, r2.c1, r2.c2, r2.c3 FROM ("S 1"."T 3" r1 INNER JOIN "S 1"."T 3" r2 ON (((r2.c1 = 43)) AND ((r1.c1 = 42))))
   InitPlan 1 (returns $0)
     ->  Foreign Scan on public.ft4 ft4_1  (cost=100.00..212.39 rows=3413 width=0)
           Remote SQL: SELECT NULL FROM "S 1"."T 3"
(7 rows)
but the code for upper-relation scans is apparently stupider than either
of those cases.
The proximate cause of the crash is that we have {PARAM 1}
(representing the output of the InitPlan) in the path's fdw_exprs, and
also the identical expression in fdw_scan_tlist, and that means that when
setrefs.c processes the ForeignScan node it thinks it should replace the
{PARAM 1} in fdw_exprs with a Var representing a reference to the
fdw_scan_tlist entry.  That would be fine if the fdw_exprs represented
expressions to be evaluated over the output of the foreign scan, but of
course they don't --- postgres_fdw uses fdw_exprs to compute values to be
sent to the remote end, instead.  So we crash at runtime because there's
no slot to supply such output to the fdw_exprs.
I was able to make the crash go away by removing this statement from
set_foreignscan_references:
        fscan->fdw_exprs = (List *)
            fix_upper_expr(root,
                           (Node *) fscan->fdw_exprs,
                           itlist,
                           INDEX_VAR,
                           rtoffset);
and we still pass check-world without that (which means we lack test
coverage, because the minimum that should happen to fdw_exprs is
fix_scan_list :-().  But I do not think that's an acceptable route to
a patch, because it amounts to having the core code know what the FDW
is using fdw_exprs for, and we explicitly disclaim any assumptions about
that.  fdw_exprs is specified to be processed the same as other
expressions in the same plan node, so I think this fix_upper_expr call
probably ought to stay like it is, even though it's not really the right
thing for postgres_fdw.  It might be the right thing for other FDWs.
(One could imagine, perhaps, having some flag in the ForeignPlan
node that tells setrefs.c what to do.  But that would be an API break
for FDWs, so it wouldn't be a back-patchable solution.)
(Actually, it seems to me that set_foreignscan_references is *already*
assuming too much about the semantics of these expressions in upper
plan nodes, so maybe we need to have a chat about that anyway.)
If we do leave it like this, then the only way for postgres_fdw to
avoid trouble is to not have any entries in fdw_exprs that exactly
match entries in fdw_scan_tlist.  So that pretty much devolves back
to what I said before: don't ship values to the far end that are
just going to be fed back as-is.  But now it's a correctness
requirement not just an optimization.
I haven't had anything to do with postgres_fdw's upper-relation-pushdown
code, so I am not sure why it's stupider than the other cases.
Thoughts anybody?
regards, tom lane
| From | Date | Subject | |
|---|---|---|---|
| Next Message | Dmitry Dolgov | 2019-04-25 20:20:59 | Re: BUG #15781: subselect on foreign table (postgres_fdw) can crash (segfault) | 
| Previous Message | Amit Langote | 2019-04-25 14:45:53 | Re: BUG #15672: PostgreSQL 11.1/11.2 crashed after dropping a partition table | 
| From | Date | Subject | |
|---|---|---|---|
| Next Message | Tom Lane | 2019-04-25 18:50:09 | Re: REINDEX INDEX results in a crash for an index of pg_class since 9.6 | 
| Previous Message | Fujii Masao | 2019-04-25 18:21:28 | Re: pg_waldump and PREPARE |