From: | Etsuro Fujita <fujita(dot)etsuro(at)lab(dot)ntt(dot)co(dot)jp> |
---|---|
To: | pgsql-hackers(at)postgresql(dot)org |
Subject: | Oddity in handling of cached plans for FDW queries |
Date: | 2016-07-12 12:40:08 |
Message-ID: | d49c1e5b-f059-20f4-c132-e9752ee0113e@lab.ntt.co.jp |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
Hi,
I noticed odd behavior in invalidating a cached plan with a foreign join
due to user mapping changes. Consider:
postgres=# create extension postgres_fdw;
postgres=# create server loopback foreign data wrapper postgres_fdw
options (dbname 'postgres');
postgres=# create user mapping for public server loopback;
postgres=# create table t1 (a int, b int);
postgres=# insert into t1 values (1, 1);
postgres=# create foreign table ft1 (a int, b int) server loopback
options (table_name 't1');
postgres=# analyze ft1;
postgres=# create view v1 as select * from ft1;
postgres=# create user v1_owner;
postgres=# alter view v1 owner to v1_owner;
postgres=# grant select on ft1 to v1_owner;
postgres=# prepare join_stmt as select * from ft1, v1 where ft1.a = v1.a;
postgres=# explain verbose execute join_stmt;
QUERY PLAN
--------------------------------------------------------------------------------------------------------------
Foreign Scan (cost=100.00..102.04 rows=1 width=16)
Output: ft1.a, ft1.b, ft1_1.a, ft1_1.b
Relations: (public.ft1) INNER JOIN (public.ft1)
Remote SQL: SELECT r1.a, r1.b, r5.a, r5.b FROM (public.t1 r1 INNER
JOIN public.t1 r5 ON (((r1.a = r5.a))))
(4 rows)
That's great.
postgres=# create user mapping for v1_owner server loopback;
postgres=# explain verbose execute join_stmt;
QUERY PLAN
------------------------------------------------------------------------------
Nested Loop (cost=200.00..202.07 rows=1 width=16)
Output: ft1.a, ft1.b, ft1_1.a, ft1_1.b
Join Filter: (ft1.a = ft1_1.a)
-> Foreign Scan on public.ft1 (cost=100.00..101.03 rows=1 width=8)
Output: ft1.a, ft1.b
Remote SQL: SELECT a, b FROM public.t1
-> Foreign Scan on public.ft1 ft1_1 (cost=100.00..101.03 rows=1
width=8)
Output: ft1_1.a, ft1_1.b
Remote SQL: SELECT a, b FROM public.t1
(9 rows)
That's also great. Since ft1 and v1 use different user mappings, the
join should not be pushed down.
postgres=# drop user mapping for v1_owner server loopback;
postgres=# explain verbose execute join_stmt;
QUERY PLAN
------------------------------------------------------------------------------
Nested Loop (cost=200.00..202.07 rows=1 width=16)
Output: ft1.a, ft1.b, ft1_1.a, ft1_1.b
Join Filter: (ft1.a = ft1_1.a)
-> Foreign Scan on public.ft1 (cost=100.00..101.03 rows=1 width=8)
Output: ft1.a, ft1.b
Remote SQL: SELECT a, b FROM public.t1
-> Foreign Scan on public.ft1 ft1_1 (cost=100.00..101.03 rows=1
width=8)
Output: ft1_1.a, ft1_1.b
Remote SQL: SELECT a, b FROM public.t1
(9 rows)
Seems odd to me. Both relations use the same user mapping as before, so
the join should be pushed down.
Another thing I noticed is that the code in plancache.c would invalidate
a cached plan with a foreign join due to user mapping changes even in
the case where user mappings are meaningless to the FDW.
To fix the first, I'd like to propose (1) replacing the existing
has_foreign_join flag in the CachedPlan data structure with a new flag,
say uses_user_mapping, that indicates whether a cached plan uses any
user mapping regardless of whether the cached plan has foreign joins or
not (likewise, replace the hasForeignJoin flag in PlannedStmt), and (2)
invalidating the cached plan if the uses_user_mapping flag is true. I
thought that we invalidate the cached plan if the flag is true and there
is a possibility of performing foreign joins, but it seems to me that
updates on the corresponding catalog are infrequent enough that such a
detailed tracking is not worth the effort. One benefit of using the
proposed approach is that we could make the FDW's handling of user
mappings in BeginForeignScan more efficient; currently, there is
additional overhead caused by catalog re-lookups to obtain the user
mapping information for the simple-foreign-table-scan case where user
mappings mean something to the FDW as in postgres_fdw (probably, updates
on the catalogs are infrequent, though), but we could improve the
efficiency by using the validated user mapping information created at
planning time for that case as well as for the foreign-join case. For
that, I'd like to propose storing the validated user mapping information
into ForeignScan, not fdw_private. There is a discussion about using an
ExtensibleNode [1] for that, but the proposed approach would make the
FDW's job much simpler.
I don't think the above change is sufficient to fix the second. The
root reason for that is that since currently, we allow the user mapping
OID (rel->umid) to be InvalidOid in two cases: (1) user mappings mean
something to the FDW but it can't get any user mapping at planning time
and (2) user mappings are meaningless to the FDW, we cannot distinguish
these two cases. So, I'd like to introduce a new callback routine to
specify that user mappings mean something to the FDW as proposed by Tom
[2], and use that to reject the former case, which allows us to set the
above uses_user_mapping flag appropriately, ie, set the flag to true
only if user mapping changes require forcing a replan. This would
change the postgres_fdw's behavior that it allows to prepare statements
involving foreign tables without any user mappings as long as those
aren't required at planning time, but I'm not sure that it's a good idea
to keep that behavior because ISTM that such a behavior would make
people sloppy about creating user mappings, which could lead to latent
security problems [2].
Attached is a proposed patch for that.
Best regards,
Etsuro Fujita
[1]
https://www.postgresql.org/message-id/CA%2BTgmoZK6BB4RTb5paz45Vme%3Dq6Z3D7FF2-VKdVyQCS1ps-cGw%40mail.gmail.com
[2] https://www.postgresql.org/message-id/18299.1457923919%40sss.pgh.pa.us
Attachment | Content-Type | Size |
---|---|---|
user-mapping-handling.patch | text/x-diff | 35.4 KB |
From | Date | Subject | |
---|---|---|---|
Next Message | Stephen Frost | 2016-07-12 13:22:25 | Re: Showing parallel status in \df+ |
Previous Message | PostgreSQL - Hans-Jürgen Schönig | 2016-07-12 10:46:25 | Re: remove checkpoint_warning |